/Users/cfruan/miniconda3/envs/mlc-chat-venv/bin/python -m mlc_chat gen_config /var/folders/50/mzqbqxqj5fddcby2mg3h334c0000gp/T/tmphxbtotbe/repo --quantization q4f16_1 --conv-template LM --output /var/folders/50/mzqbqxqj5fddcby2mg3h334c0000gp/T/tmpe1uh65a9 --context-window-size 16384 [2024-01-29 21:59:26] INFO auto_config.py:115: [92mFound[0m model configuration: /var/folders/50/mzqbqxqj5fddcby2mg3h334c0000gp/T/tmphxbtotbe/repo/config.json [2024-01-29 21:59:26] INFO auto_config.py:153: [92mFound[0m model type: [1mllama[0m. Use `--model-type` to override. [2024-01-29 21:59:26] INFO llama_model.py:51: [1mcontext_window_size[0m not found in config.json. Falling back to [1mmax_position_embeddings[0m (16384) [2024-01-29 21:59:26] INFO llama_model.py:71: [1mprefill_chunk_size[0m defaults to [1mcontext_window_size[0m (16384) [2024-01-29 21:59:26] INFO config.py:106: Overriding [1mcontext_window_size[0m from 16384 to 16384 [2024-01-29 21:59:26] INFO config.py:106: Overriding [1mmax_batch_size[0m from 1 to 80 [2024-01-29 21:59:26] INFO gen_config.py:116: [generation_config.json] Setting [1mbos_token_id[0m: 1 [2024-01-29 21:59:26] INFO gen_config.py:116: [generation_config.json] Setting [1meos_token_id[0m: 2 [2024-01-29 21:59:26] INFO gen_config.py:128: [92mFound[0m tokenizer config: /var/folders/50/mzqbqxqj5fddcby2mg3h334c0000gp/T/tmphxbtotbe/repo/tokenizer.model. Copying to [1m/var/folders/50/mzqbqxqj5fddcby2mg3h334c0000gp/T/tmpe1uh65a9/tokenizer.model[0m [2024-01-29 21:59:26] INFO gen_config.py:128: [92mFound[0m tokenizer config: /var/folders/50/mzqbqxqj5fddcby2mg3h334c0000gp/T/tmphxbtotbe/repo/tokenizer.json. Copying to [1m/var/folders/50/mzqbqxqj5fddcby2mg3h334c0000gp/T/tmpe1uh65a9/tokenizer.json[0m [2024-01-29 21:59:26] INFO gen_config.py:130: [91mNot found[0m tokenizer config: /var/folders/50/mzqbqxqj5fddcby2mg3h334c0000gp/T/tmphxbtotbe/repo/vocab.json [2024-01-29 21:59:26] INFO gen_config.py:130: [91mNot found[0m tokenizer config: /var/folders/50/mzqbqxqj5fddcby2mg3h334c0000gp/T/tmphxbtotbe/repo/merges.txt [2024-01-29 21:59:26] INFO gen_config.py:130: [91mNot found[0m tokenizer config: /var/folders/50/mzqbqxqj5fddcby2mg3h334c0000gp/T/tmphxbtotbe/repo/added_tokens.json [2024-01-29 21:59:26] INFO gen_config.py:128: [92mFound[0m tokenizer config: /var/folders/50/mzqbqxqj5fddcby2mg3h334c0000gp/T/tmphxbtotbe/repo/tokenizer_config.json. Copying to [1m/var/folders/50/mzqbqxqj5fddcby2mg3h334c0000gp/T/tmpe1uh65a9/tokenizer_config.json[0m [2024-01-29 21:59:26] INFO gen_config.py:69: [System default] Setting [1mpad_token_id[0m: 0 [2024-01-29 21:59:26] INFO gen_config.py:69: [System default] Setting [1mtemperature[0m: 0.7 [2024-01-29 21:59:26] INFO gen_config.py:69: [System default] Setting [1mrepetition_penalty[0m: 1.0 [2024-01-29 21:59:26] INFO gen_config.py:69: [System default] Setting [1mtop_p[0m: 0.95 [2024-01-29 21:59:26] INFO gen_config.py:69: [System default] Setting [1mmean_gen_len[0m: 128 [2024-01-29 21:59:26] INFO gen_config.py:69: [System default] Setting [1mmax_gen_len[0m: 512 [2024-01-29 21:59:26] INFO gen_config.py:69: [System default] Setting [1mshift_fill_factor[0m: 0.3 [2024-01-29 21:59:26] INFO gen_config.py:158: Dumping configuration file to: [1m/var/folders/50/mzqbqxqj5fddcby2mg3h334c0000gp/T/tmpe1uh65a9/mlc-chat-config.json[0m /Users/cfruan/miniconda3/envs/mlc-chat-venv/bin/python -m mlc_chat convert_weight /var/folders/50/mzqbqxqj5fddcby2mg3h334c0000gp/T/tmphxbtotbe/repo --quantization q4f16_1 --source-format auto --output /var/folders/50/mzqbqxqj5fddcby2mg3h334c0000gp/T/tmpe1uh65a9 [2024-01-29 21:59:26] INFO auto_config.py:115: [92mFound[0m model configuration: /var/folders/50/mzqbqxqj5fddcby2mg3h334c0000gp/T/tmphxbtotbe/repo/config.json [2024-01-29 21:59:27] INFO auto_device.py:85: [91mNot found[0m device: cuda:0 [2024-01-29 21:59:27] INFO auto_device.py:85: [91mNot found[0m device: rocm:0 [2024-01-29 21:59:27] INFO auto_device.py:76: [92mFound[0m device: metal:0 [2024-01-29 21:59:28] INFO auto_device.py:85: [91mNot found[0m device: vulkan:0 [2024-01-29 21:59:28] INFO auto_device.py:85: [91mNot found[0m device: opencl:0 [2024-01-29 21:59:28] INFO auto_device.py:33: Using device: [1mmetal:0[0m [2024-01-29 21:59:28] INFO auto_weight.py:70: Finding weights in: /var/folders/50/mzqbqxqj5fddcby2mg3h334c0000gp/T/tmphxbtotbe/repo [2024-01-29 21:59:28] INFO auto_weight.py:120: [92mFound[0m source weight format: huggingface-torch. Source configuration: /var/folders/50/mzqbqxqj5fddcby2mg3h334c0000gp/T/tmphxbtotbe/repo/pytorch_model.bin.index.json [2024-01-29 21:59:28] INFO auto_weight.py:143: [92mFound[0m source weight format: huggingface-safetensor. Source configuration: /var/folders/50/mzqbqxqj5fddcby2mg3h334c0000gp/T/tmphxbtotbe/repo/model.safetensors.index.json [2024-01-29 21:59:28] INFO auto_weight.py:106: Using source weight configuration: [1m/var/folders/50/mzqbqxqj5fddcby2mg3h334c0000gp/T/tmphxbtotbe/repo/pytorch_model.bin.index.json[0m. Use `--source` to override. [2024-01-29 21:59:28] INFO auto_weight.py:110: Using source weight format: [1mhuggingface-torch[0m. Use `--source-format` to override. [2024-01-29 21:59:28] INFO auto_config.py:153: [92mFound[0m model type: [1mllama[0m. Use `--model-type` to override. [2024-01-29 21:59:28] INFO llama_model.py:51: [1mcontext_window_size[0m not found in config.json. Falling back to [1mmax_position_embeddings[0m (16384) [2024-01-29 21:59:28] INFO llama_model.py:71: [1mprefill_chunk_size[0m defaults to [1mcontext_window_size[0m (16384) [1mWeight conversion with arguments:[0m [1m--config[0m /var/folders/50/mzqbqxqj5fddcby2mg3h334c0000gp/T/tmphxbtotbe/repo/config.json [1m--quantization[0m GroupQuantize(name='q4f16_1', kind='group-quant', group_size=32, quantize_dtype='int4', storage_dtype='uint32', model_dtype='float16', linear_weight_layout='NK', num_elem_per_storage=8, num_storage_per_group=4, max_int_value=7) [1m--model-type[0m llama [1m--device[0m metal:0 [1m--source[0m /var/folders/50/mzqbqxqj5fddcby2mg3h334c0000gp/T/tmphxbtotbe/repo/pytorch_model.bin.index.json [1m--source-format[0m huggingface-torch [1m--output[0m /var/folders/50/mzqbqxqj5fddcby2mg3h334c0000gp/T/tmpe1uh65a9 0%| | 0/291 [00:00<?, ?it/s] [2024-01-29 21:59:33] INFO huggingface_loader.py:169: Loading HF parameters from: /var/folders/50/mzqbqxqj5fddcby2mg3h334c0000gp/T/tmphxbtotbe/repo/pytorch_model-00007-of-00007.bin 0%| | 0/291 [00:00<?, ?it/s] [2024-01-29 21:59:38] INFO group_quantization.py:227: Compiling quantize function for key: ((32000, 8192), float16, metal, axis=1, output_transpose=False) 0%| | 0/291 [00:04<?, ?it/s] [2024-01-29 21:59:38] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mlm_head.q_weight[0m", shape: (32000, 1024), dtype: uint32 0%| | 0/291 [00:04<?, ?it/s] [2024-01-29 21:59:38] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mlm_head.q_scale[0m", shape: (32000, 256), dtype: float16 0%| | 0/291 [00:04<?, ?it/s] 0%|▎ | 1/291 [00:04<23:12, 4.80s/it] [2024-01-29 21:59:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.41.input_layernorm.weight[0m", shape: (8192,), dtype: float16 0%|▎ | 1/291 [00:04<23:12, 4.80s/it] [2024-01-29 21:59:39] INFO group_quantization.py:227: Compiling quantize function for key: ((8192, 22016), float16, metal, axis=1, output_transpose=False) 0%|▎ | 1/291 [00:05<23:12, 4.80s/it] [2024-01-29 21:59:39] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.41.mlp.down_proj.q_weight[0m", shape: (8192, 2752), dtype: uint32 0%|▎ | 1/291 [00:05<23:12, 4.80s/it] [2024-01-29 21:59:39] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.41.mlp.down_proj.q_scale[0m", shape: (8192, 688), dtype: float16 0%|▎ | 1/291 [00:05<23:12, 4.80s/it] 1%|█ | 3/291 [00:05<07:02, 1.47s/it] [2024-01-29 21:59:39] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.41.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 1%|█ | 3/291 [00:05<07:02, 1.47s/it] [2024-01-29 21:59:39] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.42.input_layernorm.weight[0m", shape: (8192,), dtype: float16 1%|█ | 3/291 [00:05<07:02, 1.47s/it] [2024-01-29 21:59:39] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.42.mlp.down_proj.q_weight[0m", shape: (8192, 2752), dtype: uint32 1%|█ | 3/291 [00:05<07:02, 1.47s/it] [2024-01-29 21:59:39] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.42.mlp.down_proj.q_scale[0m", shape: (8192, 688), dtype: float16 1%|█ | 3/291 [00:05<07:02, 1.47s/it] 2%|██▏ | 6/291 [00:05<03:10, 1.49it/s] [2024-01-29 21:59:40] INFO group_quantization.py:227: Compiling quantize function for key: ((44032, 8192), float16, metal, axis=1, output_transpose=False) 2%|██▏ | 6/291 [00:07<03:10, 1.49it/s] [2024-01-29 21:59:40] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.42.mlp.gate_up_proj.q_weight[0m", shape: (44032, 1024), dtype: uint32 2%|██▏ | 6/291 [00:07<03:10, 1.49it/s] [2024-01-29 21:59:41] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.42.mlp.gate_up_proj.q_scale[0m", shape: (44032, 256), dtype: float16 2%|██▏ | 6/291 [00:07<03:10, 1.49it/s] 2%|██▌ | 7/291 [00:07<03:50, 1.23it/s] [2024-01-29 21:59:41] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.42.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 2%|██▌ | 7/291 [00:07<03:50, 1.23it/s] [2024-01-29 21:59:41] INFO group_quantization.py:227: Compiling quantize function for key: ((10240, 8192), float16, metal, axis=1, output_transpose=False) 2%|██▌ | 7/291 [00:07<03:50, 1.23it/s] [2024-01-29 21:59:41] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.42.self_attn.qkv_proj.q_weight[0m", shape: (10240, 1024), dtype: uint32 2%|██▌ | 7/291 [00:07<03:50, 1.23it/s] [2024-01-29 21:59:41] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.42.self_attn.qkv_proj.q_scale[0m", shape: (10240, 256), dtype: float16 2%|██▌ | 7/291 [00:07<03:50, 1.23it/s] 3%|███▎ | 9/291 [00:07<02:36, 1.80it/s] [2024-01-29 21:59:41] INFO group_quantization.py:227: Compiling quantize function for key: ((8192, 8192), float16, metal, axis=1, output_transpose=False) 3%|███▎ | 9/291 [00:07<02:36, 1.80it/s] [2024-01-29 21:59:41] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.42.self_attn.o_proj.q_weight[0m", shape: (8192, 1024), dtype: uint32 3%|███▎ | 9/291 [00:07<02:36, 1.80it/s] [2024-01-29 21:59:41] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.42.self_attn.o_proj.q_scale[0m", shape: (8192, 256), dtype: float16 3%|███▎ | 9/291 [00:07<02:36, 1.80it/s] 3%|███▋ | 10/291 [00:07<02:16, 2.05it/s] [2024-01-29 21:59:41] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.43.input_layernorm.weight[0m", shape: (8192,), dtype: float16 3%|███▋ | 10/291 [00:07<02:16, 2.05it/s] [2024-01-29 21:59:42] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.43.mlp.down_proj.q_weight[0m", shape: (8192, 2752), dtype: uint32 3%|███▋ | 10/291 [00:08<02:16, 2.05it/s] [2024-01-29 21:59:42] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.43.mlp.down_proj.q_scale[0m", shape: (8192, 688), dtype: float16 3%|███▋ | 10/291 [00:08<02:16, 2.05it/s] 4%|████▍ | 12/291 [00:08<01:50, 2.54it/s] [2024-01-29 21:59:43] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.43.mlp.gate_up_proj.q_weight[0m", shape: (44032, 1024), dtype: uint32 4%|████▍ | 12/291 [00:09<01:50, 2.54it/s] [2024-01-29 21:59:43] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.43.mlp.gate_up_proj.q_scale[0m", shape: (44032, 256), dtype: float16 4%|████▍ | 12/291 [00:09<01:50, 2.54it/s] 4%|████▊ | 13/291 [00:09<02:34, 1.79it/s] [2024-01-29 21:59:43] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.43.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 4%|████▊ | 13/291 [00:09<02:34, 1.79it/s] [2024-01-29 21:59:43] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.43.self_attn.qkv_proj.q_weight[0m", shape: (10240, 1024), dtype: uint32 4%|████▊ | 13/291 [00:09<02:34, 1.79it/s] [2024-01-29 21:59:43] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.43.self_attn.qkv_proj.q_scale[0m", shape: (10240, 256), dtype: float16 4%|████▊ | 13/291 [00:09<02:34, 1.79it/s] 5%|█████▌ | 15/291 [00:09<01:49, 2.52it/s] [2024-01-29 21:59:43] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.43.self_attn.o_proj.q_weight[0m", shape: (8192, 1024), dtype: uint32 5%|█████▌ | 15/291 [00:09<01:49, 2.52it/s] [2024-01-29 21:59:43] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.43.self_attn.o_proj.q_scale[0m", shape: (8192, 256), dtype: float16 5%|█████▌ | 15/291 [00:09<01:49, 2.52it/s] 5%|█████▉ | 16/291 [00:09<01:36, 2.84it/s] [2024-01-29 21:59:43] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.44.input_layernorm.weight[0m", shape: (8192,), dtype: float16 5%|█████▉ | 16/291 [00:09<01:36, 2.84it/s] [2024-01-29 21:59:44] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.44.mlp.down_proj.q_weight[0m", shape: (8192, 2752), dtype: uint32 5%|█████▉ | 16/291 [00:10<01:36, 2.84it/s] [2024-01-29 21:59:44] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.44.mlp.down_proj.q_scale[0m", shape: (8192, 688), dtype: float16 5%|█████▉ | 16/291 [00:10<01:36, 2.84it/s] 6%|██████▌ | 18/291 [00:10<01:25, 3.21it/s] [2024-01-29 21:59:45] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.44.mlp.gate_up_proj.q_weight[0m", shape: (44032, 1024), dtype: uint32 6%|██████▌ | 18/291 [00:11<01:25, 3.21it/s] [2024-01-29 21:59:45] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.44.mlp.gate_up_proj.q_scale[0m", shape: (44032, 256), dtype: float16 6%|██████▌ | 18/291 [00:11<01:25, 3.21it/s] 7%|██████▉ | 19/291 [00:11<02:15, 2.01it/s] [2024-01-29 21:59:45] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.44.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 7%|██████▉ | 19/291 [00:11<02:15, 2.01it/s] [2024-01-29 21:59:45] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.44.self_attn.qkv_proj.q_weight[0m", shape: (10240, 1024), dtype: uint32 7%|██████▉ | 19/291 [00:11<02:15, 2.01it/s] [2024-01-29 21:59:45] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.44.self_attn.qkv_proj.q_scale[0m", shape: (10240, 256), dtype: float16 7%|██████▉ | 19/291 [00:11<02:15, 2.01it/s] 7%|███████▋ | 21/291 [00:11<01:37, 2.78it/s] [2024-01-29 21:59:45] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.44.self_attn.o_proj.q_weight[0m", shape: (8192, 1024), dtype: uint32 7%|███████▋ | 21/291 [00:11<01:37, 2.78it/s] [2024-01-29 21:59:45] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.44.self_attn.o_proj.q_scale[0m", shape: (8192, 256), dtype: float16 7%|███████▋ | 21/291 [00:11<01:37, 2.78it/s] 8%|████████ | 22/291 [00:11<01:26, 3.10it/s] [2024-01-29 21:59:45] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.45.input_layernorm.weight[0m", shape: (8192,), dtype: float16 8%|████████ | 22/291 [00:11<01:26, 3.10it/s] [2024-01-29 21:59:46] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.45.mlp.down_proj.q_weight[0m", shape: (8192, 2752), dtype: uint32 8%|████████ | 22/291 [00:12<01:26, 3.10it/s] [2024-01-29 21:59:46] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.45.mlp.down_proj.q_scale[0m", shape: (8192, 688), dtype: float16 8%|████████ | 22/291 [00:12<01:26, 3.10it/s] 8%|████████▊ | 24/291 [00:12<01:18, 3.40it/s] [2024-01-29 21:59:47] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.45.mlp.gate_up_proj.q_weight[0m", shape: (44032, 1024), dtype: uint32 8%|████████▊ | 24/291 [00:13<01:18, 3.40it/s] [2024-01-29 21:59:47] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.45.mlp.gate_up_proj.q_scale[0m", shape: (44032, 256), dtype: float16 8%|████████▊ | 24/291 [00:13<01:18, 3.40it/s] 9%|█████████▏ | 25/291 [00:13<02:09, 2.05it/s] [2024-01-29 21:59:47] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.45.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 9%|█████████▏ | 25/291 [00:13<02:09, 2.05it/s] [2024-01-29 21:59:47] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.45.self_attn.qkv_proj.q_weight[0m", shape: (10240, 1024), dtype: uint32 9%|█████████▏ | 25/291 [00:13<02:09, 2.05it/s] [2024-01-29 21:59:47] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.45.self_attn.qkv_proj.q_scale[0m", shape: (10240, 256), dtype: float16 9%|█████████▏ | 25/291 [00:13<02:09, 2.05it/s] 9%|█████████▉ | 27/291 [00:13<01:33, 2.82it/s] [2024-01-29 21:59:47] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.45.self_attn.o_proj.q_weight[0m", shape: (8192, 1024), dtype: uint32 9%|█████████▉ | 27/291 [00:14<01:33, 2.82it/s] [2024-01-29 21:59:47] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.45.self_attn.o_proj.q_scale[0m", shape: (8192, 256), dtype: float16 9%|█████████▉ | 27/291 [00:14<01:33, 2.82it/s] 10%|██████████▎ | 28/291 [00:14<01:23, 3.14it/s] [2024-01-29 21:59:47] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.46.input_layernorm.weight[0m", shape: (8192,), dtype: float16 10%|██████████▎ | 28/291 [00:14<01:23, 3.14it/s] [2024-01-29 21:59:48] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.46.mlp.down_proj.q_weight[0m", shape: (8192, 2752), dtype: uint32 10%|██████████▎ | 28/291 [00:14<01:23, 3.14it/s] [2024-01-29 21:59:48] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.46.mlp.down_proj.q_scale[0m", shape: (8192, 688), dtype: float16 10%|██████████▎ | 28/291 [00:14<01:23, 3.14it/s] 10%|███████████ | 30/291 [00:14<01:15, 3.43it/s] [2024-01-29 21:59:49] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.46.mlp.gate_up_proj.q_weight[0m", shape: (44032, 1024), dtype: uint32 10%|███████████ | 30/291 [00:15<01:15, 3.43it/s] [2024-01-29 21:59:49] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.46.mlp.gate_up_proj.q_scale[0m", shape: (44032, 256), dtype: float16 10%|███████████ | 30/291 [00:15<01:15, 3.43it/s] 11%|███████████▍ | 31/291 [00:15<02:06, 2.05it/s] [2024-01-29 21:59:49] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.46.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 11%|███████████▍ | 31/291 [00:15<02:06, 2.05it/s] [2024-01-29 21:59:49] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.46.self_attn.qkv_proj.q_weight[0m", shape: (10240, 1024), dtype: uint32 11%|███████████▍ | 31/291 [00:16<02:06, 2.05it/s] [2024-01-29 21:59:49] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.46.self_attn.qkv_proj.q_scale[0m", shape: (10240, 256), dtype: float16 11%|███████████▍ | 31/291 [00:16<02:06, 2.05it/s] 11%|████████████▏ | 33/291 [00:16<01:31, 2.82it/s] [2024-01-29 21:59:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.46.self_attn.o_proj.q_weight[0m", shape: (8192, 1024), dtype: uint32 11%|████████████▏ | 33/291 [00:16<01:31, 2.82it/s] [2024-01-29 21:59:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.46.self_attn.o_proj.q_scale[0m", shape: (8192, 256), dtype: float16 11%|████████████▏ | 33/291 [00:16<01:31, 2.82it/s] 12%|████████████▌ | 34/291 [00:16<01:21, 3.14it/s] [2024-01-29 21:59:50] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.47.input_layernorm.weight[0m", shape: (8192,), dtype: float16 12%|████████████▌ | 34/291 [00:16<01:21, 3.14it/s] [2024-01-29 21:59:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.47.mlp.down_proj.q_weight[0m", shape: (8192, 2752), dtype: uint32 12%|████████████▌ | 34/291 [00:16<01:21, 3.14it/s] [2024-01-29 21:59:50] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.47.mlp.down_proj.q_scale[0m", shape: (8192, 688), dtype: float16 12%|████████████▌ | 34/291 [00:16<01:21, 3.14it/s] 12%|█████████████▏ | 36/291 [00:16<01:14, 3.44it/s] [2024-01-29 21:59:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.47.mlp.gate_up_proj.q_weight[0m", shape: (44032, 1024), dtype: uint32 12%|█████████████▏ | 36/291 [00:18<01:14, 3.44it/s] [2024-01-29 21:59:51] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.47.mlp.gate_up_proj.q_scale[0m", shape: (44032, 256), dtype: float16 12%|█████████████▏ | 36/291 [00:18<01:14, 3.44it/s] 13%|█████████████▌ | 37/291 [00:18<02:04, 2.04it/s] [2024-01-29 21:59:51] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.47.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 13%|█████████████▌ | 37/291 [00:18<02:04, 2.04it/s] [2024-01-29 21:59:52] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.47.self_attn.qkv_proj.q_weight[0m", shape: (10240, 1024), dtype: uint32 13%|█████████████▌ | 37/291 [00:18<02:04, 2.04it/s] [2024-01-29 21:59:52] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.47.self_attn.qkv_proj.q_scale[0m", shape: (10240, 256), dtype: float16 13%|█████████████▌ | 37/291 [00:18<02:04, 2.04it/s] 13%|██████████████▎ | 39/291 [00:18<01:29, 2.82it/s] [2024-01-29 21:59:52] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.47.self_attn.o_proj.q_weight[0m", shape: (8192, 1024), dtype: uint32 13%|██████████████▎ | 39/291 [00:18<01:29, 2.82it/s] [2024-01-29 21:59:52] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.47.self_attn.o_proj.q_scale[0m", shape: (8192, 256), dtype: float16 13%|██████████████▎ | 39/291 [00:18<01:29, 2.82it/s] 14%|██████████████▋ | 40/291 [00:18<01:20, 3.13it/s] [2024-01-29 21:59:52] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.norm.weight[0m", shape: (8192,), dtype: float16 14%|██████████████▋ | 40/291 [00:18<01:20, 3.13it/s] [2024-01-29 21:59:52] INFO huggingface_loader.py:181: Unloading HF weight file: /var/folders/50/mzqbqxqj5fddcby2mg3h334c0000gp/T/tmphxbtotbe/repo/pytorch_model-00007-of-00007.bin 14%|██████████████▋ | 40/291 [00:18<01:20, 3.13it/s] [2024-01-29 21:59:52] INFO huggingface_loader.py:169: Loading HF parameters from: /var/folders/50/mzqbqxqj5fddcby2mg3h334c0000gp/T/tmphxbtotbe/repo/pytorch_model-00001-of-00007.bin 14%|██████████████▋ | 40/291 [00:18<01:20, 3.13it/s] [2024-01-29 21:59:56] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.embed_tokens.q_weight[0m", shape: (32000, 1024), dtype: uint32 14%|██████████████▋ | 40/291 [00:22<01:20, 3.13it/s] [2024-01-29 21:59:56] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.embed_tokens.q_scale[0m", shape: (32000, 256), dtype: float16 14%|██████████████▋ | 40/291 [00:22<01:20, 3.13it/s] 14%|███████████████▍ | 42/291 [00:22<03:58, 1.04it/s] [2024-01-29 21:59:56] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.0.input_layernorm.weight[0m", shape: (8192,), dtype: float16 14%|███████████████▍ | 42/291 [00:22<03:58, 1.04it/s] [2024-01-29 21:59:56] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.0.mlp.down_proj.q_weight[0m", shape: (8192, 2752), dtype: uint32 14%|███████████████▍ | 42/291 [00:22<03:58, 1.04it/s] [2024-01-29 21:59:56] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.0.mlp.down_proj.q_scale[0m", shape: (8192, 688), dtype: float16 14%|███████████████▍ | 42/291 [00:22<03:58, 1.04it/s] 15%|████████████████▏ | 44/291 [00:22<02:54, 1.41it/s] [2024-01-29 21:59:58] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.0.mlp.gate_up_proj.q_weight[0m", shape: (44032, 1024), dtype: uint32 15%|████████████████▏ | 44/291 [00:24<02:54, 1.41it/s] [2024-01-29 21:59:58] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.0.mlp.gate_up_proj.q_scale[0m", shape: (44032, 256), dtype: float16 15%|████████████████▏ | 44/291 [00:24<02:54, 1.41it/s] 15%|████████████████▌ | 45/291 [00:24<03:22, 1.21it/s] [2024-01-29 21:59:58] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.0.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 15%|████████████████▌ | 45/291 [00:24<03:22, 1.21it/s] [2024-01-29 21:59:58] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.0.self_attn.qkv_proj.q_weight[0m", shape: (10240, 1024), dtype: uint32 15%|████████████████▌ | 45/291 [00:24<03:22, 1.21it/s] [2024-01-29 21:59:58] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.0.self_attn.qkv_proj.q_scale[0m", shape: (10240, 256), dtype: float16 15%|████████████████▌ | 45/291 [00:24<03:22, 1.21it/s] 16%|█████████████████▎ | 47/291 [00:24<02:19, 1.74it/s] [2024-01-29 21:59:58] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.0.self_attn.o_proj.q_weight[0m", shape: (8192, 1024), dtype: uint32 16%|█████████████████▎ | 47/291 [00:24<02:19, 1.74it/s] [2024-01-29 21:59:58] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.0.self_attn.o_proj.q_scale[0m", shape: (8192, 256), dtype: float16 16%|█████████████████▎ | 47/291 [00:24<02:19, 1.74it/s] 16%|█████████████████▋ | 48/291 [00:24<01:59, 2.03it/s] [2024-01-29 21:59:58] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.1.input_layernorm.weight[0m", shape: (8192,), dtype: float16 16%|█████████████████▋ | 48/291 [00:24<01:59, 2.03it/s] [2024-01-29 21:59:59] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.1.mlp.down_proj.q_weight[0m", shape: (8192, 2752), dtype: uint32 16%|█████████████████▋ | 48/291 [00:25<01:59, 2.03it/s] [2024-01-29 21:59:59] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.1.mlp.down_proj.q_scale[0m", shape: (8192, 688), dtype: float16 16%|█████████████████▋ | 48/291 [00:25<01:59, 2.03it/s] 17%|██████████████████▍ | 50/291 [00:25<01:36, 2.49it/s] [2024-01-29 22:00:00] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.1.mlp.gate_up_proj.q_weight[0m", shape: (44032, 1024), dtype: uint32 17%|██████████████████▍ | 50/291 [00:26<01:36, 2.49it/s] [2024-01-29 22:00:00] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.1.mlp.gate_up_proj.q_scale[0m", shape: (44032, 256), dtype: float16 17%|██████████████████▍ | 50/291 [00:26<01:36, 2.49it/s] 18%|██████████████████▊ | 51/291 [00:26<02:17, 1.74it/s] [2024-01-29 22:00:00] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.1.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 18%|██████████████████▊ | 51/291 [00:26<02:17, 1.74it/s] [2024-01-29 22:00:00] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.1.self_attn.qkv_proj.q_weight[0m", shape: (10240, 1024), dtype: uint32 18%|██████████████████▊ | 51/291 [00:26<02:17, 1.74it/s] [2024-01-29 22:00:00] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.1.self_attn.qkv_proj.q_scale[0m", shape: (10240, 256), dtype: float16 18%|██████████████████▊ | 51/291 [00:26<02:17, 1.74it/s] 18%|███████████████████▍ | 53/291 [00:26<01:37, 2.44it/s] [2024-01-29 22:00:00] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.1.self_attn.o_proj.q_weight[0m", shape: (8192, 1024), dtype: uint32 18%|███████████████████▍ | 53/291 [00:26<01:37, 2.44it/s] [2024-01-29 22:00:00] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.1.self_attn.o_proj.q_scale[0m", shape: (8192, 256), dtype: float16 18%|███████████████████▍ | 53/291 [00:26<01:37, 2.44it/s] 19%|███████████████████▊ | 54/291 [00:26<01:25, 2.76it/s] [2024-01-29 22:00:00] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.2.input_layernorm.weight[0m", shape: (8192,), dtype: float16 19%|███████████████████▊ | 54/291 [00:26<01:25, 2.76it/s] [2024-01-29 22:00:01] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.2.mlp.down_proj.q_weight[0m", shape: (8192, 2752), dtype: uint32 19%|███████████████████▊ | 54/291 [00:27<01:25, 2.76it/s] [2024-01-29 22:00:01] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.2.mlp.down_proj.q_scale[0m", shape: (8192, 688), dtype: float16 19%|███████████████████▊ | 54/291 [00:27<01:25, 2.76it/s] 19%|████████████████████▌ | 56/291 [00:27<01:14, 3.14it/s] [2024-01-29 22:00:02] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.2.mlp.gate_up_proj.q_weight[0m", shape: (44032, 1024), dtype: uint32 19%|████████████████████▌ | 56/291 [00:28<01:14, 3.14it/s] [2024-01-29 22:00:02] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.2.mlp.gate_up_proj.q_scale[0m", shape: (44032, 256), dtype: float16 19%|████████████████████▌ | 56/291 [00:28<01:14, 3.14it/s] 20%|████████████████████▉ | 57/291 [00:28<01:58, 1.98it/s] [2024-01-29 22:00:02] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.2.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 20%|████████████████████▉ | 57/291 [00:28<01:58, 1.98it/s] [2024-01-29 22:00:02] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.2.self_attn.qkv_proj.q_weight[0m", shape: (10240, 1024), dtype: uint32 20%|████████████████████▉ | 57/291 [00:28<01:58, 1.98it/s] [2024-01-29 22:00:02] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.2.self_attn.qkv_proj.q_scale[0m", shape: (10240, 256), dtype: float16 20%|████████████████████▉ | 57/291 [00:28<01:58, 1.98it/s] 20%|█████████████████████▋ | 59/291 [00:28<01:24, 2.74it/s] [2024-01-29 22:00:02] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.2.self_attn.o_proj.q_weight[0m", shape: (8192, 1024), dtype: uint32 20%|█████████████████████▋ | 59/291 [00:29<01:24, 2.74it/s] [2024-01-29 22:00:02] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.2.self_attn.o_proj.q_scale[0m", shape: (8192, 256), dtype: float16 20%|█████████████████████▋ | 59/291 [00:29<01:24, 2.74it/s] 21%|██████████████████████ | 60/291 [00:29<01:15, 3.05it/s] [2024-01-29 22:00:02] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.3.input_layernorm.weight[0m", shape: (8192,), dtype: float16 21%|██████████████████████ | 60/291 [00:29<01:15, 3.05it/s] [2024-01-29 22:00:03] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.3.mlp.down_proj.q_weight[0m", shape: (8192, 2752), dtype: uint32 21%|██████████████████████ | 60/291 [00:29<01:15, 3.05it/s] [2024-01-29 22:00:03] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.3.mlp.down_proj.q_scale[0m", shape: (8192, 688), dtype: float16 21%|██████████████████████ | 60/291 [00:29<01:15, 3.05it/s] 21%|██████████████████████▊ | 62/291 [00:29<01:08, 3.36it/s] [2024-01-29 22:00:04] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.3.mlp.gate_up_proj.q_weight[0m", shape: (44032, 1024), dtype: uint32 21%|██████████████████████▊ | 62/291 [00:30<01:08, 3.36it/s] [2024-01-29 22:00:04] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.3.mlp.gate_up_proj.q_scale[0m", shape: (44032, 256), dtype: float16 21%|██████████████████████▊ | 62/291 [00:30<01:08, 3.36it/s] 22%|███████████████████████▏ | 63/291 [00:30<01:52, 2.02it/s] [2024-01-29 22:00:04] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.3.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 22%|███████████████████████▏ | 63/291 [00:30<01:52, 2.02it/s] [2024-01-29 22:00:04] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.3.self_attn.qkv_proj.q_weight[0m", shape: (10240, 1024), dtype: uint32 22%|███████████████████████▏ | 63/291 [00:31<01:52, 2.02it/s] [2024-01-29 22:00:04] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.3.self_attn.qkv_proj.q_scale[0m", shape: (10240, 256), dtype: float16 22%|███████████████████████▏ | 63/291 [00:31<01:52, 2.02it/s] 22%|███████████████████████▉ | 65/291 [00:31<01:20, 2.80it/s] [2024-01-29 22:00:05] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.3.self_attn.o_proj.q_weight[0m", shape: (8192, 1024), dtype: uint32 22%|███████████████████████▉ | 65/291 [00:31<01:20, 2.80it/s] [2024-01-29 22:00:05] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.3.self_attn.o_proj.q_scale[0m", shape: (8192, 256), dtype: float16 22%|███████████████████████▉ | 65/291 [00:31<01:20, 2.80it/s] 23%|████████████████████████▎ | 66/291 [00:31<01:12, 3.10it/s] [2024-01-29 22:00:05] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.4.input_layernorm.weight[0m", shape: (8192,), dtype: float16 23%|████████████████████████▎ | 66/291 [00:31<01:12, 3.10it/s] [2024-01-29 22:00:05] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.4.mlp.down_proj.q_weight[0m", shape: (8192, 2752), dtype: uint32 23%|████████████████████████▎ | 66/291 [00:31<01:12, 3.10it/s] [2024-01-29 22:00:05] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.4.mlp.down_proj.q_scale[0m", shape: (8192, 688), dtype: float16 23%|████████████████████████▎ | 66/291 [00:31<01:12, 3.10it/s] 23%|█████████████████████████ | 68/291 [00:31<01:05, 3.40it/s] [2024-01-29 22:00:06] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.4.mlp.gate_up_proj.q_weight[0m", shape: (44032, 1024), dtype: uint32 23%|█████████████████████████ | 68/291 [00:32<01:05, 3.40it/s] [2024-01-29 22:00:06] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.4.mlp.gate_up_proj.q_scale[0m", shape: (44032, 256), dtype: float16 23%|█████████████████████████ | 68/291 [00:32<01:05, 3.40it/s] 24%|█████████████████████████▎ | 69/291 [00:32<01:50, 2.02it/s] [2024-01-29 22:00:06] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.4.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 24%|█████████████████████████▎ | 69/291 [00:32<01:50, 2.02it/s] [2024-01-29 22:00:07] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.4.self_attn.qkv_proj.q_weight[0m", shape: (10240, 1024), dtype: uint32 24%|█████████████████████████▎ | 69/291 [00:33<01:50, 2.02it/s] [2024-01-29 22:00:07] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.4.self_attn.qkv_proj.q_scale[0m", shape: (10240, 256), dtype: float16 24%|█████████████████████████▎ | 69/291 [00:33<01:50, 2.02it/s] 24%|██████████████████████████ | 71/291 [00:33<01:19, 2.78it/s] [2024-01-29 22:00:07] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.4.self_attn.o_proj.q_weight[0m", shape: (8192, 1024), dtype: uint32 24%|██████████████████████████ | 71/291 [00:33<01:19, 2.78it/s] [2024-01-29 22:00:07] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.4.self_attn.o_proj.q_scale[0m", shape: (8192, 256), dtype: float16 24%|██████████████████████████ | 71/291 [00:33<01:19, 2.78it/s] 25%|██████████████████████████▍ | 72/291 [00:33<01:10, 3.10it/s] [2024-01-29 22:00:07] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.5.input_layernorm.weight[0m", shape: (8192,), dtype: float16 25%|██████████████████████████▍ | 72/291 [00:33<01:10, 3.10it/s] [2024-01-29 22:00:07] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.5.mlp.down_proj.q_weight[0m", shape: (8192, 2752), dtype: uint32 25%|██████████████████████████▍ | 72/291 [00:33<01:10, 3.10it/s] [2024-01-29 22:00:07] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.5.mlp.down_proj.q_scale[0m", shape: (8192, 688), dtype: float16 25%|██████████████████████████▍ | 72/291 [00:33<01:10, 3.10it/s] 25%|███████████████████████████▏ | 74/291 [00:33<01:05, 3.30it/s] [2024-01-29 22:00:09] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.5.mlp.gate_up_proj.q_weight[0m", shape: (44032, 1024), dtype: uint32 25%|███████████████████████████▏ | 74/291 [00:35<01:05, 3.30it/s] [2024-01-29 22:00:09] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.5.mlp.gate_up_proj.q_scale[0m", shape: (44032, 256), dtype: float16 25%|███████████████████████████▏ | 74/291 [00:35<01:05, 3.30it/s] 26%|███████████████████████████▌ | 75/291 [00:35<01:49, 1.97it/s] [2024-01-29 22:00:09] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.5.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 26%|███████████████████████████▌ | 75/291 [00:35<01:49, 1.97it/s] [2024-01-29 22:00:09] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.5.self_attn.qkv_proj.q_weight[0m", shape: (10240, 1024), dtype: uint32 26%|███████████████████████████▌ | 75/291 [00:35<01:49, 1.97it/s] [2024-01-29 22:00:09] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.5.self_attn.qkv_proj.q_scale[0m", shape: (10240, 256), dtype: float16 26%|███████████████████████████▌ | 75/291 [00:35<01:49, 1.97it/s] 26%|████████████████████████████▎ | 77/291 [00:35<01:18, 2.73it/s] [2024-01-29 22:00:09] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.5.self_attn.o_proj.q_weight[0m", shape: (8192, 1024), dtype: uint32 26%|████████████████████████████▎ | 77/291 [00:35<01:18, 2.73it/s] [2024-01-29 22:00:09] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.5.self_attn.o_proj.q_scale[0m", shape: (8192, 256), dtype: float16 26%|████████████████████████████▎ | 77/291 [00:35<01:18, 2.73it/s] 27%|████████████████████████████▋ | 78/291 [00:35<01:09, 3.05it/s] [2024-01-29 22:00:10] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.6.mlp.gate_up_proj.q_weight[0m", shape: (44032, 1024), dtype: uint32 27%|████████████████████████████▋ | 78/291 [00:37<01:09, 3.05it/s] [2024-01-29 22:00:10] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.6.mlp.gate_up_proj.q_scale[0m", shape: (44032, 256), dtype: float16 27%|████████████████████████████▋ | 78/291 [00:37<01:09, 3.05it/s] 27%|█████████████████████████████ | 79/291 [00:37<01:59, 1.77it/s] [2024-01-29 22:00:11] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.6.self_attn.qkv_proj.q_weight[0m", shape: (10240, 1024), dtype: uint32 27%|█████████████████████████████ | 79/291 [00:37<01:59, 1.77it/s] [2024-01-29 22:00:11] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.6.self_attn.qkv_proj.q_scale[0m", shape: (10240, 256), dtype: float16 27%|█████████████████████████████ | 79/291 [00:37<01:59, 1.77it/s] 27%|█████████████████████████████▍ | 80/291 [00:37<01:44, 2.01it/s] [2024-01-29 22:00:11] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.6.self_attn.o_proj.q_weight[0m", shape: (8192, 1024), dtype: uint32 27%|█████████████████████████████▍ | 80/291 [00:37<01:44, 2.01it/s] [2024-01-29 22:00:11] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.6.self_attn.o_proj.q_scale[0m", shape: (8192, 256), dtype: float16 27%|█████████████████████████████▍ | 80/291 [00:37<01:44, 2.01it/s] 28%|█████████████████████████████▊ | 81/291 [00:37<01:28, 2.37it/s] [2024-01-29 22:00:11] INFO huggingface_loader.py:181: Unloading HF weight file: /var/folders/50/mzqbqxqj5fddcby2mg3h334c0000gp/T/tmphxbtotbe/repo/pytorch_model-00001-of-00007.bin 28%|█████████████████████████████▊ | 81/291 [00:37<01:28, 2.37it/s] [2024-01-29 22:00:11] INFO huggingface_loader.py:169: Loading HF parameters from: /var/folders/50/mzqbqxqj5fddcby2mg3h334c0000gp/T/tmphxbtotbe/repo/pytorch_model-00002-of-00007.bin 28%|█████████████████████████████▊ | 81/291 [00:37<01:28, 2.37it/s] [2024-01-29 22:00:14] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.10.input_layernorm.weight[0m", shape: (8192,), dtype: float16 28%|█████████████████████████████▊ | 81/291 [00:40<01:28, 2.37it/s] 28%|██████████████████████████████▏ | 82/291 [00:40<04:15, 1.22s/it] [2024-01-29 22:00:15] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.10.mlp.down_proj.q_weight[0m", shape: (8192, 2752), dtype: uint32 28%|██████████████████████████████▏ | 82/291 [00:41<04:15, 1.22s/it] [2024-01-29 22:00:15] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.10.mlp.down_proj.q_scale[0m", shape: (8192, 688), dtype: float16 28%|██████████████████████████████▏ | 82/291 [00:41<04:15, 1.22s/it] 29%|██████████████████████████████▌ | 83/291 [00:41<03:31, 1.02s/it] [2024-01-29 22:00:16] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.10.mlp.gate_up_proj.q_weight[0m", shape: (44032, 1024), dtype: uint32 29%|██████████████████████████████▌ | 83/291 [00:42<03:31, 1.02s/it] [2024-01-29 22:00:16] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.10.mlp.gate_up_proj.q_scale[0m", shape: (44032, 256), dtype: float16 29%|██████████████████████████████▌ | 83/291 [00:42<03:31, 1.02s/it] 29%|██████████████████████████████▉ | 84/291 [00:42<03:48, 1.11s/it] [2024-01-29 22:00:16] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.10.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 29%|██████████████████████████████▉ | 84/291 [00:42<03:48, 1.11s/it] [2024-01-29 22:00:16] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.10.self_attn.qkv_proj.q_weight[0m", shape: (10240, 1024), dtype: uint32 29%|██████████████████████████████▉ | 84/291 [00:42<03:48, 1.11s/it] [2024-01-29 22:00:16] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.10.self_attn.qkv_proj.q_scale[0m", shape: (10240, 256), dtype: float16 29%|██████████████████████████████▉ | 84/291 [00:42<03:48, 1.11s/it] 30%|███████████████████████████████▌ | 86/291 [00:42<02:16, 1.50it/s] [2024-01-29 22:00:16] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.10.self_attn.o_proj.q_weight[0m", shape: (8192, 1024), dtype: uint32 30%|███████████████████████████████▌ | 86/291 [00:43<02:16, 1.50it/s] [2024-01-29 22:00:16] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.10.self_attn.o_proj.q_scale[0m", shape: (8192, 256), dtype: float16 30%|███████████████████████████████▌ | 86/291 [00:43<02:16, 1.50it/s] 30%|███████████████████████████████▉ | 87/291 [00:43<01:52, 1.81it/s] [2024-01-29 22:00:16] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.11.input_layernorm.weight[0m", shape: (8192,), dtype: float16 30%|███████████████████████████████▉ | 87/291 [00:43<01:52, 1.81it/s] [2024-01-29 22:00:17] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.11.mlp.down_proj.q_weight[0m", shape: (8192, 2752), dtype: uint32 30%|███████████████████████████████▉ | 87/291 [00:43<01:52, 1.81it/s] [2024-01-29 22:00:17] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.11.mlp.down_proj.q_scale[0m", shape: (8192, 688), dtype: float16 30%|███████████████████████████████▉ | 87/291 [00:43<01:52, 1.81it/s] 31%|████████████████████████████████▋ | 89/291 [00:43<01:27, 2.32it/s] [2024-01-29 22:00:18] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.11.mlp.gate_up_proj.q_weight[0m", shape: (44032, 1024), dtype: uint32 31%|████████████████████████████████▋ | 89/291 [00:44<01:27, 2.32it/s] [2024-01-29 22:00:18] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.11.mlp.gate_up_proj.q_scale[0m", shape: (44032, 256), dtype: float16 31%|████████████████████████████████▋ | 89/291 [00:44<01:27, 2.32it/s] 31%|█████████████████████████████████ | 90/291 [00:44<02:02, 1.65it/s] [2024-01-29 22:00:18] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.11.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 31%|█████████████████████████████████ | 90/291 [00:44<02:02, 1.65it/s] [2024-01-29 22:00:18] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.11.self_attn.qkv_proj.q_weight[0m", shape: (10240, 1024), dtype: uint32 31%|█████████████████████████████████ | 90/291 [00:45<02:02, 1.65it/s] [2024-01-29 22:00:18] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.11.self_attn.qkv_proj.q_scale[0m", shape: (10240, 256), dtype: float16 31%|█████████████████████████████████ | 90/291 [00:45<02:02, 1.65it/s] 32%|█████████████████████████████████▊ | 92/291 [00:45<01:23, 2.37it/s] [2024-01-29 22:00:19] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.11.self_attn.o_proj.q_weight[0m", shape: (8192, 1024), dtype: uint32 32%|█████████████████████████████████▊ | 92/291 [00:45<01:23, 2.37it/s] [2024-01-29 22:00:19] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.11.self_attn.o_proj.q_scale[0m", shape: (8192, 256), dtype: float16 32%|█████████████████████████████████▊ | 92/291 [00:45<01:23, 2.37it/s] 32%|██████████████████████████████████▏ | 93/291 [00:45<01:13, 2.70it/s] [2024-01-29 22:00:19] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.12.input_layernorm.weight[0m", shape: (8192,), dtype: float16 32%|██████████████████████████████████▏ | 93/291 [00:45<01:13, 2.70it/s] [2024-01-29 22:00:19] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.12.mlp.down_proj.q_weight[0m", shape: (8192, 2752), dtype: uint32 32%|██████████████████████████████████▏ | 93/291 [00:45<01:13, 2.70it/s] [2024-01-29 22:00:19] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.12.mlp.down_proj.q_scale[0m", shape: (8192, 688), dtype: float16 32%|██████████████████████████████████▏ | 93/291 [00:45<01:13, 2.70it/s] 33%|██████████████████████████████████▉ | 95/291 [00:45<01:04, 3.02it/s] [2024-01-29 22:00:21] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.12.mlp.gate_up_proj.q_weight[0m", shape: (44032, 1024), dtype: uint32 33%|██████████████████████████████████▉ | 95/291 [00:47<01:04, 3.02it/s] [2024-01-29 22:00:21] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.12.mlp.gate_up_proj.q_scale[0m", shape: (44032, 256), dtype: float16 33%|██████████████████████████████████▉ | 95/291 [00:47<01:04, 3.02it/s] 33%|███████████████████████████████████▎ | 96/291 [00:47<01:48, 1.80it/s] [2024-01-29 22:00:21] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.12.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 33%|███████████████████████████████████▎ | 96/291 [00:47<01:48, 1.80it/s] [2024-01-29 22:00:21] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.12.self_attn.qkv_proj.q_weight[0m", shape: (10240, 1024), dtype: uint32 33%|███████████████████████████████████▎ | 96/291 [00:47<01:48, 1.80it/s] [2024-01-29 22:00:21] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.12.self_attn.qkv_proj.q_scale[0m", shape: (10240, 256), dtype: float16 33%|███████████████████████████████████▎ | 96/291 [00:47<01:48, 1.80it/s] 34%|████████████████████████████████████ | 98/291 [00:47<01:16, 2.53it/s] [2024-01-29 22:00:21] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.12.self_attn.o_proj.q_weight[0m", shape: (8192, 1024), dtype: uint32 34%|████████████████████████████████████ | 98/291 [00:47<01:16, 2.53it/s] [2024-01-29 22:00:21] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.12.self_attn.o_proj.q_scale[0m", shape: (8192, 256), dtype: float16 34%|████████████████████████████████████ | 98/291 [00:47<01:16, 2.53it/s] 34%|████████████████████████████████████▍ | 99/291 [00:47<01:07, 2.85it/s] [2024-01-29 22:00:22] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.13.mlp.gate_up_proj.q_weight[0m", shape: (44032, 1024), dtype: uint32 34%|████████████████████████████████████▍ | 99/291 [00:48<01:07, 2.85it/s] [2024-01-29 22:00:22] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.13.mlp.gate_up_proj.q_scale[0m", shape: (44032, 256), dtype: float16 34%|████████████████████████████████████▍ | 99/291 [00:49<01:07, 2.85it/s] 34%|████████████████████████████████████▍ | 100/291 [00:49<01:49, 1.74it/s] [2024-01-29 22:00:23] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.13.self_attn.qkv_proj.q_weight[0m", shape: (10240, 1024), dtype: uint32 34%|████████████████████████████████████▍ | 100/291 [00:49<01:49, 1.74it/s] [2024-01-29 22:00:23] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.13.self_attn.qkv_proj.q_scale[0m", shape: (10240, 256), dtype: float16 34%|████████████████████████████████████▍ | 100/291 [00:49<01:49, 1.74it/s] 35%|████████████████████████████████████▊ | 101/291 [00:49<01:34, 2.01it/s] [2024-01-29 22:00:23] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.13.self_attn.o_proj.q_weight[0m", shape: (8192, 1024), dtype: uint32 35%|████████████████████████████████████▊ | 101/291 [00:49<01:34, 2.01it/s] [2024-01-29 22:00:23] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.13.self_attn.o_proj.q_scale[0m", shape: (8192, 256), dtype: float16 35%|████████████████████████████████████▊ | 101/291 [00:49<01:34, 2.01it/s] 35%|█████████████████████████████████████▏ | 102/291 [00:49<01:18, 2.40it/s] [2024-01-29 22:00:23] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.6.input_layernorm.weight[0m", shape: (8192,), dtype: float16 35%|█████████████████████████████████████▏ | 102/291 [00:49<01:18, 2.40it/s] [2024-01-29 22:00:23] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.6.mlp.down_proj.q_weight[0m", shape: (8192, 2752), dtype: uint32 35%|█████████████████████████████████████▏ | 102/291 [00:49<01:18, 2.40it/s] [2024-01-29 22:00:23] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.6.mlp.down_proj.q_scale[0m", shape: (8192, 688), dtype: float16 35%|█████████████████████████████████████▏ | 102/291 [00:49<01:18, 2.40it/s] 36%|█████████████████████████████████████▉ | 104/291 [00:49<01:04, 2.92it/s] [2024-01-29 22:00:23] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.6.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 36%|█████████████████████████████████████▉ | 104/291 [00:49<01:04, 2.92it/s] [2024-01-29 22:00:23] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.7.input_layernorm.weight[0m", shape: (8192,), dtype: float16 36%|█████████████████████████████████████▉ | 104/291 [00:49<01:04, 2.92it/s] [2024-01-29 22:00:24] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.7.mlp.down_proj.q_weight[0m", shape: (8192, 2752), dtype: uint32 36%|█████████████████████████████████████▉ | 104/291 [00:50<01:04, 2.92it/s] [2024-01-29 22:00:24] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.7.mlp.down_proj.q_scale[0m", shape: (8192, 688), dtype: float16 36%|█████████████████████████████████████▉ | 104/291 [00:50<01:04, 2.92it/s] 37%|██████████████████████████████████████▉ | 107/291 [00:50<00:47, 3.88it/s] [2024-01-29 22:00:25] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.7.mlp.gate_up_proj.q_weight[0m", shape: (44032, 1024), dtype: uint32 37%|██████████████████████████████████████▉ | 107/291 [00:51<00:47, 3.88it/s] [2024-01-29 22:00:25] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.7.mlp.gate_up_proj.q_scale[0m", shape: (44032, 256), dtype: float16 37%|██████████████████████████████████████▉ | 107/291 [00:51<00:47, 3.88it/s] 37%|███████████████████████████████████████▎ | 108/291 [00:51<01:21, 2.26it/s] [2024-01-29 22:00:25] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.7.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 37%|███████████████████████████████████████▎ | 108/291 [00:51<01:21, 2.26it/s] [2024-01-29 22:00:25] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.7.self_attn.qkv_proj.q_weight[0m", shape: (10240, 1024), dtype: uint32 37%|███████████████████████████████████████▎ | 108/291 [00:51<01:21, 2.26it/s] [2024-01-29 22:00:25] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.7.self_attn.qkv_proj.q_scale[0m", shape: (10240, 256), dtype: float16 37%|███████████████████████████████████████▎ | 108/291 [00:51<01:21, 2.26it/s] 38%|████████████████████████████████████████ | 110/291 [00:51<01:00, 2.97it/s] [2024-01-29 22:00:25] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.7.self_attn.o_proj.q_weight[0m", shape: (8192, 1024), dtype: uint32 38%|████████████████████████████████████████ | 110/291 [00:52<01:00, 2.97it/s] [2024-01-29 22:00:25] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.7.self_attn.o_proj.q_scale[0m", shape: (8192, 256), dtype: float16 38%|████████████████████████████████████████ | 110/291 [00:52<01:00, 2.97it/s] 38%|████████████████████████████████████████▍ | 111/291 [00:52<00:55, 3.26it/s] [2024-01-29 22:00:25] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.8.input_layernorm.weight[0m", shape: (8192,), dtype: float16 38%|████████████████████████████████████████▍ | 111/291 [00:52<00:55, 3.26it/s] [2024-01-29 22:00:26] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.8.mlp.down_proj.q_weight[0m", shape: (8192, 2752), dtype: uint32 38%|████████████████████████████████████████▍ | 111/291 [00:52<00:55, 3.26it/s] [2024-01-29 22:00:26] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.8.mlp.down_proj.q_scale[0m", shape: (8192, 688), dtype: float16 38%|████████████████████████████████████████▍ | 111/291 [00:52<00:55, 3.26it/s] 39%|█████████████████████████████████████████▏ | 113/291 [00:52<00:50, 3.50it/s] [2024-01-29 22:00:27] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.8.mlp.gate_up_proj.q_weight[0m", shape: (44032, 1024), dtype: uint32 39%|█████████████████████████████████████████▏ | 113/291 [00:53<00:50, 3.50it/s] [2024-01-29 22:00:27] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.8.mlp.gate_up_proj.q_scale[0m", shape: (44032, 256), dtype: float16 39%|█████████████████████████████████████████▏ | 113/291 [00:53<00:50, 3.50it/s] 39%|█████████████████████████████████████████▌ | 114/291 [00:53<01:24, 2.10it/s] [2024-01-29 22:00:27] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.8.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 39%|█████████████████████████████████████████▌ | 114/291 [00:53<01:24, 2.10it/s] [2024-01-29 22:00:27] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.8.self_attn.qkv_proj.q_weight[0m", shape: (10240, 1024), dtype: uint32 39%|█████████████████████████████████████████▌ | 114/291 [00:54<01:24, 2.10it/s] [2024-01-29 22:00:27] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.8.self_attn.qkv_proj.q_scale[0m", shape: (10240, 256), dtype: float16 39%|█████████████████████████████████████████▌ | 114/291 [00:54<01:24, 2.10it/s] 40%|██████████████████████████████████████████▎ | 116/291 [00:54<01:01, 2.85it/s] [2024-01-29 22:00:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.8.self_attn.o_proj.q_weight[0m", shape: (8192, 1024), dtype: uint32 40%|██████████████████████████████████████████▎ | 116/291 [00:54<01:01, 2.85it/s] [2024-01-29 22:00:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.8.self_attn.o_proj.q_scale[0m", shape: (8192, 256), dtype: float16 40%|██████████████████████████████████████████▎ | 116/291 [00:54<01:01, 2.85it/s] 40%|██████████████████████████████████████████▌ | 117/291 [00:54<00:55, 3.16it/s] [2024-01-29 22:00:28] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.9.input_layernorm.weight[0m", shape: (8192,), dtype: float16 40%|██████████████████████████████████████████▌ | 117/291 [00:54<00:55, 3.16it/s] [2024-01-29 22:00:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.9.mlp.down_proj.q_weight[0m", shape: (8192, 2752), dtype: uint32 40%|██████████████████████████████████████████▌ | 117/291 [00:54<00:55, 3.16it/s] [2024-01-29 22:00:28] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.9.mlp.down_proj.q_scale[0m", shape: (8192, 688), dtype: float16 40%|██████████████████████████████████████████▌ | 117/291 [00:54<00:55, 3.16it/s] 41%|███████████████████████████████████████████▎ | 119/291 [00:54<00:50, 3.44it/s] [2024-01-29 22:00:29] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.9.mlp.gate_up_proj.q_weight[0m", shape: (44032, 1024), dtype: uint32 41%|███████████████████████████████████████████▎ | 119/291 [00:56<00:50, 3.44it/s] [2024-01-29 22:00:29] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.9.mlp.gate_up_proj.q_scale[0m", shape: (44032, 256), dtype: float16 41%|███████████████████████████████████████████▎ | 119/291 [00:56<00:50, 3.44it/s] 41%|███████████████████████████████████████████▋ | 120/291 [00:56<01:23, 2.04it/s] [2024-01-29 22:00:29] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.9.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 41%|███████████████████████████████████████████▋ | 120/291 [00:56<01:23, 2.04it/s] [2024-01-29 22:00:30] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.9.self_attn.qkv_proj.q_weight[0m", shape: (10240, 1024), dtype: uint32 41%|███████████████████████████████████████████▋ | 120/291 [00:56<01:23, 2.04it/s] [2024-01-29 22:00:30] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.9.self_attn.qkv_proj.q_scale[0m", shape: (10240, 256), dtype: float16 41%|███████████████████████████████████████████▋ | 120/291 [00:56<01:23, 2.04it/s] 42%|████████████████████████████████████████████▍ | 122/291 [00:56<01:00, 2.81it/s] [2024-01-29 22:00:30] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.9.self_attn.o_proj.q_weight[0m", shape: (8192, 1024), dtype: uint32 42%|████████████████████████████████████████████▍ | 122/291 [00:56<01:00, 2.81it/s] [2024-01-29 22:00:30] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.9.self_attn.o_proj.q_scale[0m", shape: (8192, 256), dtype: float16 42%|████████████████████████████████████████████▍ | 122/291 [00:56<01:00, 2.81it/s] 42%|████████████████████████████████████████████▊ | 123/291 [00:56<00:53, 3.12it/s] [2024-01-29 22:00:30] INFO huggingface_loader.py:181: Unloading HF weight file: /var/folders/50/mzqbqxqj5fddcby2mg3h334c0000gp/T/tmphxbtotbe/repo/pytorch_model-00002-of-00007.bin 42%|████████████████████████████████████████████▊ | 123/291 [00:56<00:53, 3.12it/s] [2024-01-29 22:00:30] INFO huggingface_loader.py:169: Loading HF parameters from: /var/folders/50/mzqbqxqj5fddcby2mg3h334c0000gp/T/tmphxbtotbe/repo/pytorch_model-00003-of-00007.bin 42%|████████████████████████████████████████████▊ | 123/291 [00:56<00:53, 3.12it/s] [2024-01-29 22:00:33] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.13.input_layernorm.weight[0m", shape: (8192,), dtype: float16 42%|████████████████████████████████████████████▊ | 123/291 [00:59<00:53, 3.12it/s] 43%|█████████████████████████████████████████████▏ | 124/291 [00:59<02:50, 1.02s/it] [2024-01-29 22:00:34] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.13.mlp.down_proj.q_weight[0m", shape: (8192, 2752), dtype: uint32 43%|█████████████████████████████████████████████▏ | 124/291 [01:00<02:50, 1.02s/it] [2024-01-29 22:00:34] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.13.mlp.down_proj.q_scale[0m", shape: (8192, 688), dtype: float16 43%|█████████████████████████████████████████████▏ | 124/291 [01:00<02:50, 1.02s/it] 43%|█████████████████████████████████████████████▌ | 125/291 [01:00<02:27, 1.12it/s] [2024-01-29 22:00:34] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.13.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 43%|█████████████████████████████████████████████▌ | 125/291 [01:00<02:27, 1.12it/s] [2024-01-29 22:00:34] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.14.input_layernorm.weight[0m", shape: (8192,), dtype: float16 43%|█████████████████████████████████████████████▌ | 125/291 [01:00<02:27, 1.12it/s] [2024-01-29 22:00:34] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.14.mlp.down_proj.q_weight[0m", shape: (8192, 2752), dtype: uint32 43%|█████████████████████████████████████████████▌ | 125/291 [01:00<02:27, 1.12it/s] [2024-01-29 22:00:34] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.14.mlp.down_proj.q_scale[0m", shape: (8192, 688), dtype: float16 43%|█████████████████████████████████████████████▌ | 125/291 [01:00<02:27, 1.12it/s] 44%|██████████████████████████████████████████████▋ | 128/291 [01:00<01:23, 1.94it/s] [2024-01-29 22:00:35] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.14.mlp.gate_up_proj.q_weight[0m", shape: (44032, 1024), dtype: uint32 44%|██████████████████████████████████████████████▋ | 128/291 [01:02<01:23, 1.94it/s] [2024-01-29 22:00:35] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.14.mlp.gate_up_proj.q_scale[0m", shape: (44032, 256), dtype: float16 44%|██████████████████████████████████████████████▋ | 128/291 [01:02<01:23, 1.94it/s] 44%|██████████████████████████████████████████████▉ | 129/291 [01:02<01:49, 1.48it/s] [2024-01-29 22:00:35] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.14.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 44%|██████████████████████████████████████████████▉ | 129/291 [01:02<01:49, 1.48it/s] [2024-01-29 22:00:36] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.14.self_attn.qkv_proj.q_weight[0m", shape: (10240, 1024), dtype: uint32 44%|██████████████████████████████████████████████▉ | 129/291 [01:02<01:49, 1.48it/s] [2024-01-29 22:00:36] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.14.self_attn.qkv_proj.q_scale[0m", shape: (10240, 256), dtype: float16 44%|██████████████████████████████████████████████▉ | 129/291 [01:02<01:49, 1.48it/s] 45%|███████████████████████████████████████████████▋ | 131/291 [01:02<01:16, 2.09it/s] [2024-01-29 22:00:36] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.14.self_attn.o_proj.q_weight[0m", shape: (8192, 1024), dtype: uint32 45%|███████████████████████████████████████████████▋ | 131/291 [01:02<01:16, 2.09it/s] [2024-01-29 22:00:36] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.14.self_attn.o_proj.q_scale[0m", shape: (8192, 256), dtype: float16 45%|███████████████████████████████████████████████▋ | 131/291 [01:02<01:16, 2.09it/s] 45%|████████████████████████████████████████████████ | 132/291 [01:02<01:06, 2.38it/s] [2024-01-29 22:00:36] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.15.input_layernorm.weight[0m", shape: (8192,), dtype: float16 45%|████████████████████████████████████████████████ | 132/291 [01:02<01:06, 2.38it/s] [2024-01-29 22:00:36] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.15.mlp.down_proj.q_weight[0m", shape: (8192, 2752), dtype: uint32 45%|████████████████████████████████████████████████ | 132/291 [01:03<01:06, 2.38it/s] [2024-01-29 22:00:36] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.15.mlp.down_proj.q_scale[0m", shape: (8192, 688), dtype: float16 45%|████████████████████████████████████████████████ | 132/291 [01:03<01:06, 2.38it/s] 46%|████████████████████████████████████████████████▊ | 134/291 [01:03<00:55, 2.82it/s] [2024-01-29 22:00:38] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.15.mlp.gate_up_proj.q_weight[0m", shape: (44032, 1024), dtype: uint32 46%|████████████████████████████████████████████████▊ | 134/291 [01:04<00:55, 2.82it/s] [2024-01-29 22:00:38] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.15.mlp.gate_up_proj.q_scale[0m", shape: (44032, 256), dtype: float16 46%|████████████████████████████████████████████████▊ | 134/291 [01:04<00:55, 2.82it/s] 46%|█████████████████████████████████████████████████▏ | 135/291 [01:04<01:23, 1.87it/s] [2024-01-29 22:00:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.15.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 46%|█████████████████████████████████████████████████▏ | 135/291 [01:04<01:23, 1.87it/s] [2024-01-29 22:00:38] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.15.self_attn.qkv_proj.q_weight[0m", shape: (10240, 1024), dtype: uint32 46%|█████████████████████████████████████████████████▏ | 135/291 [01:04<01:23, 1.87it/s] [2024-01-29 22:00:38] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.15.self_attn.qkv_proj.q_scale[0m", shape: (10240, 256), dtype: float16 46%|█████████████████████████████████████████████████▏ | 135/291 [01:04<01:23, 1.87it/s] 47%|█████████████████████████████████████████████████▉ | 137/291 [01:04<00:59, 2.60it/s] [2024-01-29 22:00:38] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.15.self_attn.o_proj.q_weight[0m", shape: (8192, 1024), dtype: uint32 47%|█████████████████████████████████████████████████▉ | 137/291 [01:04<00:59, 2.60it/s] [2024-01-29 22:00:38] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.15.self_attn.o_proj.q_scale[0m", shape: (8192, 256), dtype: float16 47%|█████████████████████████████████████████████████▉ | 137/291 [01:04<00:59, 2.60it/s] 47%|██████████████████████████████████████████████████▎ | 138/291 [01:04<00:52, 2.92it/s] [2024-01-29 22:00:38] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.16.input_layernorm.weight[0m", shape: (8192,), dtype: float16 47%|██████████████████████████████████████████████████▎ | 138/291 [01:04<00:52, 2.92it/s] [2024-01-29 22:00:39] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.16.mlp.down_proj.q_weight[0m", shape: (8192, 2752), dtype: uint32 47%|██████████████████████████████████████████████████▎ | 138/291 [01:05<00:52, 2.92it/s] [2024-01-29 22:00:39] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.16.mlp.down_proj.q_scale[0m", shape: (8192, 688), dtype: float16 47%|██████████████████████████████████████████████████▎ | 138/291 [01:05<00:52, 2.92it/s] 48%|██████████████████████████████████████████████████▉ | 140/291 [01:05<00:46, 3.26it/s] [2024-01-29 22:00:40] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.16.mlp.gate_up_proj.q_weight[0m", shape: (44032, 1024), dtype: uint32 48%|██████████████████████████████████████████████████▉ | 140/291 [01:06<00:46, 3.26it/s] [2024-01-29 22:00:40] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.16.mlp.gate_up_proj.q_scale[0m", shape: (44032, 256), dtype: float16 48%|██████████████████████████████████████████████████▉ | 140/291 [01:06<00:46, 3.26it/s] 48%|███████████████████████████████████████████████████▎ | 141/291 [01:06<01:14, 2.01it/s] [2024-01-29 22:00:40] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.16.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 48%|███████████████████████████████████████████████████▎ | 141/291 [01:06<01:14, 2.01it/s] [2024-01-29 22:00:40] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.16.self_attn.qkv_proj.q_weight[0m", shape: (10240, 1024), dtype: uint32 48%|███████████████████████████████████████████████████▎ | 141/291 [01:06<01:14, 2.01it/s] [2024-01-29 22:00:40] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.16.self_attn.qkv_proj.q_scale[0m", shape: (10240, 256), dtype: float16 48%|███████████████████████████████████████████████████▎ | 141/291 [01:06<01:14, 2.01it/s] 49%|████████████████████████████████████████████████████ | 143/291 [01:06<00:53, 2.77it/s] [2024-01-29 22:00:40] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.16.self_attn.o_proj.q_weight[0m", shape: (8192, 1024), dtype: uint32 49%|████████████████████████████████████████████████████ | 143/291 [01:06<00:53, 2.77it/s] [2024-01-29 22:00:40] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.16.self_attn.o_proj.q_scale[0m", shape: (8192, 256), dtype: float16 49%|████████████████████████████████████████████████████ | 143/291 [01:06<00:53, 2.77it/s] 49%|████████████████████████████████████████████████████▍ | 144/291 [01:06<00:47, 3.08it/s] [2024-01-29 22:00:40] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.17.input_layernorm.weight[0m", shape: (8192,), dtype: float16 49%|████████████████████████████████████████████████████▍ | 144/291 [01:06<00:47, 3.08it/s] [2024-01-29 22:00:41] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.17.mlp.down_proj.q_weight[0m", shape: (8192, 2752), dtype: uint32 49%|████████████████████████████████████████████████████▍ | 144/291 [01:07<00:47, 3.08it/s] [2024-01-29 22:00:41] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.17.mlp.down_proj.q_scale[0m", shape: (8192, 688), dtype: float16 49%|████████████████████████████████████████████████████▍ | 144/291 [01:07<00:47, 3.08it/s] 50%|█████████████████████████████████████████████████████▏ | 146/291 [01:07<00:42, 3.39it/s] [2024-01-29 22:00:42] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.17.mlp.gate_up_proj.q_weight[0m", shape: (44032, 1024), dtype: uint32 50%|█████████████████████████████████████████████████████▏ | 146/291 [01:08<00:42, 3.39it/s] [2024-01-29 22:00:42] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.17.mlp.gate_up_proj.q_scale[0m", shape: (44032, 256), dtype: float16 50%|█████████████████████████████████████████████████████▏ | 146/291 [01:08<00:42, 3.39it/s] 51%|█████████████████████████████████████████████████████▌ | 147/291 [01:08<01:11, 2.02it/s] [2024-01-29 22:00:42] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.17.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 51%|█████████████████████████████████████████████████████▌ | 147/291 [01:08<01:11, 2.02it/s] [2024-01-29 22:00:42] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.17.self_attn.qkv_proj.q_weight[0m", shape: (10240, 1024), dtype: uint32 51%|█████████████████████████████████████████████████████▌ | 147/291 [01:08<01:11, 2.02it/s] [2024-01-29 22:00:42] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.17.self_attn.qkv_proj.q_scale[0m", shape: (10240, 256), dtype: float16 51%|█████████████████████████████████████████████████████▌ | 147/291 [01:08<01:11, 2.02it/s] 51%|██████████████████████████████████████████████████████▎ | 149/291 [01:08<00:50, 2.79it/s] [2024-01-29 22:00:42] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.17.self_attn.o_proj.q_weight[0m", shape: (8192, 1024), dtype: uint32 51%|██████████████████████████████████████████████████████▎ | 149/291 [01:09<00:50, 2.79it/s] [2024-01-29 22:00:42] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.17.self_attn.o_proj.q_scale[0m", shape: (8192, 256), dtype: float16 51%|██████████████████████████████████████████████████████▎ | 149/291 [01:09<00:50, 2.79it/s] 52%|██████████████████████████████████████████████████████▋ | 150/291 [01:09<00:45, 3.11it/s] [2024-01-29 22:00:42] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.18.input_layernorm.weight[0m", shape: (8192,), dtype: float16 52%|██████████████████████████████████████████████████████▋ | 150/291 [01:09<00:45, 3.11it/s] [2024-01-29 22:00:43] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.18.mlp.down_proj.q_weight[0m", shape: (8192, 2752), dtype: uint32 52%|██████████████████████████████████████████████████████▋ | 150/291 [01:09<00:45, 3.11it/s] [2024-01-29 22:00:43] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.18.mlp.down_proj.q_scale[0m", shape: (8192, 688), dtype: float16 52%|██████████████████████████████████████████████████████▋ | 150/291 [01:09<00:45, 3.11it/s] 52%|███████████████████████████████████████████████████████▎ | 152/291 [01:09<00:41, 3.33it/s] [2024-01-29 22:00:44] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.18.mlp.gate_up_proj.q_weight[0m", shape: (44032, 1024), dtype: uint32 52%|███████████████████████████████████████████████████████▎ | 152/291 [01:10<00:41, 3.33it/s] [2024-01-29 22:00:44] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.18.mlp.gate_up_proj.q_scale[0m", shape: (44032, 256), dtype: float16 52%|███████████████████████████████████████████████████████▎ | 152/291 [01:10<00:41, 3.33it/s] 53%|███████████████████████████████████████████████████████▋ | 153/291 [01:10<01:11, 1.94it/s] [2024-01-29 22:00:44] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.18.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 53%|███████████████████████████████████████████████████████▋ | 153/291 [01:10<01:11, 1.94it/s] [2024-01-29 22:00:45] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.18.self_attn.qkv_proj.q_weight[0m", shape: (10240, 1024), dtype: uint32 53%|███████████████████████████████████████████████████████▋ | 153/291 [01:11<01:11, 1.94it/s] [2024-01-29 22:00:45] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.18.self_attn.qkv_proj.q_scale[0m", shape: (10240, 256), dtype: float16 53%|███████████████████████████████████████████████████████▋ | 153/291 [01:11<01:11, 1.94it/s] 53%|████████████████████████████████████████████████████████▍ | 155/291 [01:11<00:50, 2.69it/s] [2024-01-29 22:00:45] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.18.self_attn.o_proj.q_weight[0m", shape: (8192, 1024), dtype: uint32 53%|████████████████████████████████████████████████████████▍ | 155/291 [01:11<00:50, 2.69it/s] [2024-01-29 22:00:45] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.18.self_attn.o_proj.q_scale[0m", shape: (8192, 256), dtype: float16 53%|████████████████████████████████████████████████████████▍ | 155/291 [01:11<00:50, 2.69it/s] 54%|████████████████████████████████████████████████████████▊ | 156/291 [01:11<00:44, 3.01it/s] [2024-01-29 22:00:45] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.19.input_layernorm.weight[0m", shape: (8192,), dtype: float16 54%|████████████████████████████████████████████████████████▊ | 156/291 [01:11<00:44, 3.01it/s] [2024-01-29 22:00:45] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.19.mlp.down_proj.q_weight[0m", shape: (8192, 2752), dtype: uint32 54%|████████████████████████████████████████████████████████▊ | 156/291 [01:11<00:44, 3.01it/s] [2024-01-29 22:00:45] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.19.mlp.down_proj.q_scale[0m", shape: (8192, 688), dtype: float16 54%|████████████████████████████████████████████████████████▊ | 156/291 [01:11<00:44, 3.01it/s] 54%|█████████████████████████████████████████████████████████▌ | 158/291 [01:11<00:40, 3.25it/s] [2024-01-29 22:00:47] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.19.mlp.gate_up_proj.q_weight[0m", shape: (44032, 1024), dtype: uint32 54%|█████████████████████████████████████████████████████████▌ | 158/291 [01:13<00:40, 3.25it/s] [2024-01-29 22:00:47] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.19.mlp.gate_up_proj.q_scale[0m", shape: (44032, 256), dtype: float16 54%|█████████████████████████████████████████████████████████▌ | 158/291 [01:13<00:40, 3.25it/s] 55%|█████████████████████████████████████████████████████████▉ | 159/291 [01:13<01:10, 1.88it/s] [2024-01-29 22:00:47] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.19.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 55%|█████████████████████████████████████████████████████████▉ | 159/291 [01:13<01:10, 1.88it/s] [2024-01-29 22:00:47] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.19.self_attn.qkv_proj.q_weight[0m", shape: (10240, 1024), dtype: uint32 55%|█████████████████████████████████████████████████████████▉ | 159/291 [01:13<01:10, 1.88it/s] [2024-01-29 22:00:47] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.19.self_attn.qkv_proj.q_scale[0m", shape: (10240, 256), dtype: float16 55%|█████████████████████████████████████████████████████████▉ | 159/291 [01:13<01:10, 1.88it/s] 55%|██████████████████████████████████████████████████████████▋ | 161/291 [01:13<00:49, 2.63it/s] [2024-01-29 22:00:47] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.19.self_attn.o_proj.q_weight[0m", shape: (8192, 1024), dtype: uint32 55%|██████████████████████████████████████████████████████████▋ | 161/291 [01:13<00:49, 2.63it/s] [2024-01-29 22:00:47] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.19.self_attn.o_proj.q_scale[0m", shape: (8192, 256), dtype: float16 55%|██████████████████████████████████████████████████████████▋ | 161/291 [01:13<00:49, 2.63it/s] 56%|███████████████████████████████████████████████████████████ | 162/291 [01:13<00:43, 2.95it/s] [2024-01-29 22:00:48] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.20.mlp.gate_up_proj.q_weight[0m", shape: (44032, 1024), dtype: uint32 56%|███████████████████████████████████████████████████████████ | 162/291 [01:15<00:43, 2.95it/s] [2024-01-29 22:00:48] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.20.mlp.gate_up_proj.q_scale[0m", shape: (44032, 256), dtype: float16 56%|███████████████████████████████████████████████████████████ | 162/291 [01:15<00:43, 2.95it/s] 56%|███████████████████████████████████████████████████████████▎ | 163/291 [01:15<01:12, 1.77it/s] [2024-01-29 22:00:49] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.20.self_attn.qkv_proj.q_weight[0m", shape: (10240, 1024), dtype: uint32 56%|███████████████████████████████████████████████████████████▎ | 163/291 [01:15<01:12, 1.77it/s] [2024-01-29 22:00:49] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.20.self_attn.qkv_proj.q_scale[0m", shape: (10240, 256), dtype: float16 56%|███████████████████████████████████████████████████████████▎ | 163/291 [01:15<01:12, 1.77it/s] 56%|███████████████████████████████████████████████████████████▋ | 164/291 [01:15<01:02, 2.03it/s] [2024-01-29 22:00:49] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.20.self_attn.o_proj.q_weight[0m", shape: (8192, 1024), dtype: uint32 56%|███████████████████████████████████████████████████████████▋ | 164/291 [01:15<01:02, 2.03it/s] [2024-01-29 22:00:49] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.20.self_attn.o_proj.q_scale[0m", shape: (8192, 256), dtype: float16 56%|███████████████████████████████████████████████████████████▋ | 164/291 [01:15<01:02, 2.03it/s] 57%|████████████████████████████████████████████████████████████ | 165/291 [01:15<00:51, 2.42it/s] [2024-01-29 22:00:49] INFO huggingface_loader.py:181: Unloading HF weight file: /var/folders/50/mzqbqxqj5fddcby2mg3h334c0000gp/T/tmphxbtotbe/repo/pytorch_model-00003-of-00007.bin 57%|████████████████████████████████████████████████████████████ | 165/291 [01:15<00:51, 2.42it/s] [2024-01-29 22:00:49] INFO huggingface_loader.py:169: Loading HF parameters from: /var/folders/50/mzqbqxqj5fddcby2mg3h334c0000gp/T/tmphxbtotbe/repo/pytorch_model-00004-of-00007.bin 57%|████████████████████████████████████████████████████████████ | 165/291 [01:15<00:51, 2.42it/s] [2024-01-29 22:00:52] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.20.input_layernorm.weight[0m", shape: (8192,), dtype: float16 57%|████████████████████████████████████████████████████████████ | 165/291 [01:18<00:51, 2.42it/s] 57%|████████████████████████████████████████████████████████████▍ | 166/291 [01:18<02:27, 1.18s/it] [2024-01-29 22:00:53] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.20.mlp.down_proj.q_weight[0m", shape: (8192, 2752), dtype: uint32 57%|████████████████████████████████████████████████████████████▍ | 166/291 [01:19<02:27, 1.18s/it] [2024-01-29 22:00:53] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.20.mlp.down_proj.q_scale[0m", shape: (8192, 688), dtype: float16 57%|████████████████████████████████████████████████████████████▍ | 166/291 [01:19<02:27, 1.18s/it] 57%|████████████████████████████████████████████████████████████▊ | 167/291 [01:19<02:02, 1.01it/s] [2024-01-29 22:00:53] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.20.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 57%|████████████████████████████████████████████████████████████▊ | 167/291 [01:19<02:02, 1.01it/s] [2024-01-29 22:00:53] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.21.input_layernorm.weight[0m", shape: (8192,), dtype: float16 57%|████████████████████████████████████████████████████████████▊ | 167/291 [01:19<02:02, 1.01it/s] [2024-01-29 22:00:53] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.21.mlp.down_proj.q_weight[0m", shape: (8192, 2752), dtype: uint32 57%|████████████████████████████████████████████████████████████▊ | 167/291 [01:19<02:02, 1.01it/s] [2024-01-29 22:00:53] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.21.mlp.down_proj.q_scale[0m", shape: (8192, 688), dtype: float16 57%|████████████████████████████████████████████████████████████▊ | 167/291 [01:19<02:02, 1.01it/s] 58%|█████████████████████████████████████████████████████████████▉ | 170/291 [01:19<01:05, 1.85it/s] [2024-01-29 22:00:54] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.21.mlp.gate_up_proj.q_weight[0m", shape: (44032, 1024), dtype: uint32 58%|█████████████████████████████████████████████████████████████▉ | 170/291 [01:21<01:05, 1.85it/s] [2024-01-29 22:00:54] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.21.mlp.gate_up_proj.q_scale[0m", shape: (44032, 256), dtype: float16 58%|█████████████████████████████████████████████████████████████▉ | 170/291 [01:21<01:05, 1.85it/s] 59%|██████████████████████████████████████████████████████████████▎ | 171/291 [01:21<01:24, 1.42it/s] [2024-01-29 22:00:54] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.21.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 59%|██████████████████████████████████████████████████████████████▎ | 171/291 [01:21<01:24, 1.42it/s] [2024-01-29 22:00:55] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.21.self_attn.qkv_proj.q_weight[0m", shape: (10240, 1024), dtype: uint32 59%|██████████████████████████████████████████████████████████████▎ | 171/291 [01:21<01:24, 1.42it/s] [2024-01-29 22:00:55] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.21.self_attn.qkv_proj.q_scale[0m", shape: (10240, 256), dtype: float16 59%|██████████████████████████████████████████████████████████████▎ | 171/291 [01:21<01:24, 1.42it/s] 59%|███████████████████████████████████████████████████████████████ | 173/291 [01:21<00:58, 2.03it/s] [2024-01-29 22:00:55] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.21.self_attn.o_proj.q_weight[0m", shape: (8192, 1024), dtype: uint32 59%|███████████████████████████████████████████████████████████████ | 173/291 [01:21<00:58, 2.03it/s] [2024-01-29 22:00:55] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.21.self_attn.o_proj.q_scale[0m", shape: (8192, 256), dtype: float16 59%|███████████████████████████████████████████████████████████████ | 173/291 [01:21<00:58, 2.03it/s] 60%|███████████████████████████████████████████████████████████████▍ | 174/291 [01:21<00:50, 2.33it/s] [2024-01-29 22:00:55] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.22.input_layernorm.weight[0m", shape: (8192,), dtype: float16 60%|███████████████████████████████████████████████████████████████▍ | 174/291 [01:21<00:50, 2.33it/s] [2024-01-29 22:00:55] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.22.mlp.down_proj.q_weight[0m", shape: (8192, 2752), dtype: uint32 60%|███████████████████████████████████████████████████████████████▍ | 174/291 [01:22<00:50, 2.33it/s] [2024-01-29 22:00:55] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.22.mlp.down_proj.q_scale[0m", shape: (8192, 688), dtype: float16 60%|███████████████████████████████████████████████████████████████▍ | 174/291 [01:22<00:50, 2.33it/s] 60%|████████████████████████████████████████████████████████████████ | 176/291 [01:22<00:41, 2.77it/s] [2024-01-29 22:00:57] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.22.mlp.gate_up_proj.q_weight[0m", shape: (44032, 1024), dtype: uint32 60%|████████████████████████████████████████████████████████████████ | 176/291 [01:23<00:41, 2.77it/s] [2024-01-29 22:00:57] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.22.mlp.gate_up_proj.q_scale[0m", shape: (44032, 256), dtype: float16 60%|████████████████████████████████████████████████████████████████ | 176/291 [01:23<00:41, 2.77it/s] 61%|████████████████████████████████████████████████████████████████▍ | 177/291 [01:23<01:01, 1.84it/s] [2024-01-29 22:00:57] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.22.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 61%|████████████████████████████████████████████████████████████████▍ | 177/291 [01:23<01:01, 1.84it/s] [2024-01-29 22:00:57] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.22.self_attn.qkv_proj.q_weight[0m", shape: (10240, 1024), dtype: uint32 61%|████████████████████████████████████████████████████████████████▍ | 177/291 [01:23<01:01, 1.84it/s] [2024-01-29 22:00:57] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.22.self_attn.qkv_proj.q_scale[0m", shape: (10240, 256), dtype: float16 61%|████████████████████████████████████████████████████████████████▍ | 177/291 [01:23<01:01, 1.84it/s] 62%|█████████████████████████████████████████████████████████████████▏ | 179/291 [01:23<00:43, 2.57it/s] [2024-01-29 22:00:57] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.22.self_attn.o_proj.q_weight[0m", shape: (8192, 1024), dtype: uint32 62%|█████████████████████████████████████████████████████████████████▏ | 179/291 [01:23<00:43, 2.57it/s] [2024-01-29 22:00:57] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.22.self_attn.o_proj.q_scale[0m", shape: (8192, 256), dtype: float16 62%|█████████████████████████████████████████████████████████████████▏ | 179/291 [01:23<00:43, 2.57it/s] 62%|█████████████████████████████████████████████████████████████████▌ | 180/291 [01:23<00:38, 2.89it/s] [2024-01-29 22:00:57] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.23.input_layernorm.weight[0m", shape: (8192,), dtype: float16 62%|█████████████████████████████████████████████████████████████████▌ | 180/291 [01:23<00:38, 2.89it/s] [2024-01-29 22:00:58] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.23.mlp.down_proj.q_weight[0m", shape: (8192, 2752), dtype: uint32 62%|█████████████████████████████████████████████████████████████████▌ | 180/291 [01:24<00:38, 2.89it/s] [2024-01-29 22:00:58] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.23.mlp.down_proj.q_scale[0m", shape: (8192, 688), dtype: float16 62%|█████████████████████████████████████████████████████████████████▌ | 180/291 [01:24<00:38, 2.89it/s] 63%|██████████████████████████████████████████████████████████████████▎ | 182/291 [01:24<00:33, 3.25it/s] [2024-01-29 22:00:59] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.23.mlp.gate_up_proj.q_weight[0m", shape: (44032, 1024), dtype: uint32 63%|██████████████████████████████████████████████████████████████████▎ | 182/291 [01:25<00:33, 3.25it/s] [2024-01-29 22:00:59] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.23.mlp.gate_up_proj.q_scale[0m", shape: (44032, 256), dtype: float16 63%|██████████████████████████████████████████████████████████████████▎ | 182/291 [01:25<00:33, 3.25it/s] 63%|██████████████████████████████████████████████████████████████████▋ | 183/291 [01:25<00:54, 1.98it/s] [2024-01-29 22:00:59] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.23.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 63%|██████████████████████████████████████████████████████████████████▋ | 183/291 [01:25<00:54, 1.98it/s] [2024-01-29 22:00:59] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.23.self_attn.qkv_proj.q_weight[0m", shape: (10240, 1024), dtype: uint32 63%|██████████████████████████████████████████████████████████████████▋ | 183/291 [01:25<00:54, 1.98it/s] [2024-01-29 22:00:59] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.23.self_attn.qkv_proj.q_scale[0m", shape: (10240, 256), dtype: float16 63%|██████████████████████████████████████████████████████████████████▋ | 183/291 [01:25<00:54, 1.98it/s] 64%|███████████████████████████████████████████████████████████████████▍ | 185/291 [01:25<00:38, 2.74it/s] [2024-01-29 22:00:59] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.23.self_attn.o_proj.q_weight[0m", shape: (8192, 1024), dtype: uint32 64%|███████████████████████████████████████████████████████████████████▍ | 185/291 [01:25<00:38, 2.74it/s] [2024-01-29 22:00:59] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.23.self_attn.o_proj.q_scale[0m", shape: (8192, 256), dtype: float16 64%|███████████████████████████████████████████████████████████████████▍ | 185/291 [01:25<00:38, 2.74it/s] 64%|███████████████████████████████████████████████████████████████████▊ | 186/291 [01:25<00:34, 3.06it/s] [2024-01-29 22:00:59] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.24.input_layernorm.weight[0m", shape: (8192,), dtype: float16 64%|███████████████████████████████████████████████████████████████████▊ | 186/291 [01:25<00:34, 3.06it/s] [2024-01-29 22:01:00] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.24.mlp.down_proj.q_weight[0m", shape: (8192, 2752), dtype: uint32 64%|███████████████████████████████████████████████████████████████████▊ | 186/291 [01:26<00:34, 3.06it/s] [2024-01-29 22:01:00] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.24.mlp.down_proj.q_scale[0m", shape: (8192, 688), dtype: float16 64%|███████████████████████████████████████████████████████████████████▊ | 186/291 [01:26<00:34, 3.06it/s] 65%|████████████████████████████████████████████████████████████████████▍ | 188/291 [01:26<00:31, 3.29it/s] [2024-01-29 22:01:01] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.24.mlp.gate_up_proj.q_weight[0m", shape: (44032, 1024), dtype: uint32 65%|████████████████████████████████████████████████████████████████████▍ | 188/291 [01:27<00:31, 3.29it/s] [2024-01-29 22:01:01] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.24.mlp.gate_up_proj.q_scale[0m", shape: (44032, 256), dtype: float16 65%|████████████████████████████████████████████████████████████████████▍ | 188/291 [01:27<00:31, 3.29it/s] 65%|████████████████████████████████████████████████████████████████████▊ | 189/291 [01:27<00:51, 1.97it/s] [2024-01-29 22:01:01] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.24.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 65%|████████████████████████████████████████████████████████████████████▊ | 189/291 [01:27<00:51, 1.97it/s] [2024-01-29 22:01:01] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.24.self_attn.qkv_proj.q_weight[0m", shape: (10240, 1024), dtype: uint32 65%|████████████████████████████████████████████████████████████████████▊ | 189/291 [01:27<00:51, 1.97it/s] [2024-01-29 22:01:01] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.24.self_attn.qkv_proj.q_scale[0m", shape: (10240, 256), dtype: float16 65%|████████████████████████████████████████████████████████████████████▊ | 189/291 [01:27<00:51, 1.97it/s] 66%|█████████████████████████████████████████████████████████████████████▌ | 191/291 [01:27<00:36, 2.71it/s] [2024-01-29 22:01:01] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.24.self_attn.o_proj.q_weight[0m", shape: (8192, 1024), dtype: uint32 66%|█████████████████████████████████████████████████████████████████████▌ | 191/291 [01:28<00:36, 2.71it/s] [2024-01-29 22:01:01] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.24.self_attn.o_proj.q_scale[0m", shape: (8192, 256), dtype: float16 66%|█████████████████████████████████████████████████████████████████████▌ | 191/291 [01:28<00:36, 2.71it/s] 66%|█████████████████████████████████████████████████████████████████████▉ | 192/291 [01:28<00:32, 3.02it/s] [2024-01-29 22:01:01] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.25.input_layernorm.weight[0m", shape: (8192,), dtype: float16 66%|█████████████████████████████████████████████████████████████████████▉ | 192/291 [01:28<00:32, 3.02it/s] [2024-01-29 22:01:02] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.25.mlp.down_proj.q_weight[0m", shape: (8192, 2752), dtype: uint32 66%|█████████████████████████████████████████████████████████████████████▉ | 192/291 [01:28<00:32, 3.02it/s] [2024-01-29 22:01:02] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.25.mlp.down_proj.q_scale[0m", shape: (8192, 688), dtype: float16 66%|█████████████████████████████████████████████████████████████████████▉ | 192/291 [01:28<00:32, 3.02it/s] 67%|██████████████████████████████████████████████████████████████████████▋ | 194/291 [01:28<00:29, 3.26it/s] [2024-01-29 22:01:03] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.25.mlp.gate_up_proj.q_weight[0m", shape: (44032, 1024), dtype: uint32 67%|██████████████████████████████████████████████████████████████████████▋ | 194/291 [01:30<00:29, 3.26it/s] [2024-01-29 22:01:03] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.25.mlp.gate_up_proj.q_scale[0m", shape: (44032, 256), dtype: float16 67%|██████████████████████████████████████████████████████████████████████▋ | 194/291 [01:30<00:29, 3.26it/s] 67%|███████████████████████████████████████████████████████████████████████ | 195/291 [01:30<00:50, 1.91it/s] [2024-01-29 22:01:03] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.25.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 67%|███████████████████████████████████████████████████████████████████████ | 195/291 [01:30<00:50, 1.91it/s] [2024-01-29 22:01:04] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.25.self_attn.qkv_proj.q_weight[0m", shape: (10240, 1024), dtype: uint32 67%|███████████████████████████████████████████████████████████████████████ | 195/291 [01:30<00:50, 1.91it/s] [2024-01-29 22:01:04] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.25.self_attn.qkv_proj.q_scale[0m", shape: (10240, 256), dtype: float16 67%|███████████████████████████████████████████████████████████████████████ | 195/291 [01:30<00:50, 1.91it/s] 68%|███████████████████████████████████████████████████████████████████████▊ | 197/291 [01:30<00:35, 2.65it/s] [2024-01-29 22:01:04] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.25.self_attn.o_proj.q_weight[0m", shape: (8192, 1024), dtype: uint32 68%|███████████████████████████████████████████████████████████████████████▊ | 197/291 [01:30<00:35, 2.65it/s] [2024-01-29 22:01:04] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.25.self_attn.o_proj.q_scale[0m", shape: (8192, 256), dtype: float16 68%|███████████████████████████████████████████████████████████████████████▊ | 197/291 [01:30<00:35, 2.65it/s] 68%|████████████████████████████████████████████████████████████████████████ | 198/291 [01:30<00:31, 2.95it/s] [2024-01-29 22:01:04] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.26.input_layernorm.weight[0m", shape: (8192,), dtype: float16 68%|████████████████████████████████████████████████████████████████████████ | 198/291 [01:30<00:31, 2.95it/s] [2024-01-29 22:01:04] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.26.mlp.down_proj.q_weight[0m", shape: (8192, 2752), dtype: uint32 68%|████████████████████████████████████████████████████████████████████████ | 198/291 [01:31<00:31, 2.95it/s] [2024-01-29 22:01:04] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.26.mlp.down_proj.q_scale[0m", shape: (8192, 688), dtype: float16 68%|████████████████████████████████████████████████████████████████████████ | 198/291 [01:31<00:31, 2.95it/s] 69%|████████████████████████████████████████████████████████████████████████▊ | 200/291 [01:31<00:28, 3.19it/s] [2024-01-29 22:01:06] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.26.mlp.gate_up_proj.q_weight[0m", shape: (44032, 1024), dtype: uint32 69%|████████████████████████████████████████████████████████████████████████▊ | 200/291 [01:32<00:28, 3.19it/s] [2024-01-29 22:01:06] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.26.mlp.gate_up_proj.q_scale[0m", shape: (44032, 256), dtype: float16 69%|████████████████████████████████████████████████████████████████████████▊ | 200/291 [01:32<00:28, 3.19it/s] 69%|█████████████████████████████████████████████████████████████████████████▏ | 201/291 [01:32<00:48, 1.87it/s] [2024-01-29 22:01:06] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.26.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 69%|█████████████████████████████████████████████████████████████████████████▏ | 201/291 [01:32<00:48, 1.87it/s] [2024-01-29 22:01:06] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.26.self_attn.qkv_proj.q_weight[0m", shape: (10240, 1024), dtype: uint32 69%|█████████████████████████████████████████████████████████████████████████▏ | 201/291 [01:32<00:48, 1.87it/s] [2024-01-29 22:01:06] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.26.self_attn.qkv_proj.q_scale[0m", shape: (10240, 256), dtype: float16 69%|█████████████████████████████████████████████████████████████████████████▏ | 201/291 [01:32<00:48, 1.87it/s] 70%|█████████████████████████████████████████████████████████████████████████▉ | 203/291 [01:32<00:33, 2.61it/s] [2024-01-29 22:01:06] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.26.self_attn.o_proj.q_weight[0m", shape: (8192, 1024), dtype: uint32 70%|█████████████████████████████████████████████████████████████████████████▉ | 203/291 [01:32<00:33, 2.61it/s] [2024-01-29 22:01:06] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.26.self_attn.o_proj.q_scale[0m", shape: (8192, 256), dtype: float16 70%|█████████████████████████████████████████████████████████████████████████▉ | 203/291 [01:32<00:33, 2.61it/s] 70%|██████████████████████████████████████████████████████████████████████████▎ | 204/291 [01:32<00:29, 2.92it/s] [2024-01-29 22:01:07] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.27.mlp.gate_up_proj.q_weight[0m", shape: (44032, 1024), dtype: uint32 70%|██████████████████████████████████████████████████████████████████████████▎ | 204/291 [01:34<00:29, 2.92it/s] [2024-01-29 22:01:07] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.27.mlp.gate_up_proj.q_scale[0m", shape: (44032, 256), dtype: float16 70%|██████████████████████████████████████████████████████████████████████████▎ | 204/291 [01:34<00:29, 2.92it/s] 70%|██████████████████████████████████████████████████████████████████████████▋ | 205/291 [01:34<00:49, 1.75it/s] [2024-01-29 22:01:08] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.27.self_attn.qkv_proj.q_weight[0m", shape: (10240, 1024), dtype: uint32 70%|██████████████████████████████████████████████████████████████████████████▋ | 205/291 [01:34<00:49, 1.75it/s] [2024-01-29 22:01:08] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.27.self_attn.qkv_proj.q_scale[0m", shape: (10240, 256), dtype: float16 70%|██████████████████████████████████████████████████████████████████████████▋ | 205/291 [01:34<00:49, 1.75it/s] 71%|███████████████████████████████████████████████████████████████████████████ | 206/291 [01:34<00:42, 2.01it/s] [2024-01-29 22:01:08] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.27.self_attn.o_proj.q_weight[0m", shape: (8192, 1024), dtype: uint32 71%|███████████████████████████████████████████████████████████████████████████ | 206/291 [01:34<00:42, 2.01it/s] [2024-01-29 22:01:08] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.27.self_attn.o_proj.q_scale[0m", shape: (8192, 256), dtype: float16 71%|███████████████████████████████████████████████████████████████████████████ | 206/291 [01:34<00:42, 2.01it/s] 71%|███████████████████████████████████████████████████████████████████████████▍ | 207/291 [01:34<00:34, 2.40it/s] [2024-01-29 22:01:08] INFO huggingface_loader.py:181: Unloading HF weight file: /var/folders/50/mzqbqxqj5fddcby2mg3h334c0000gp/T/tmphxbtotbe/repo/pytorch_model-00004-of-00007.bin 71%|███████████████████████████████████████████████████████████████████████████▍ | 207/291 [01:34<00:34, 2.40it/s] [2024-01-29 22:01:08] INFO huggingface_loader.py:169: Loading HF parameters from: /var/folders/50/mzqbqxqj5fddcby2mg3h334c0000gp/T/tmphxbtotbe/repo/pytorch_model-00005-of-00007.bin 71%|███████████████████████████████████████████████████████████████████████████▍ | 207/291 [01:34<00:34, 2.40it/s] [2024-01-29 22:01:11] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.27.input_layernorm.weight[0m", shape: (8192,), dtype: float16 71%|███████████████████████████████████████████████████████████████████████████▍ | 207/291 [01:37<00:34, 2.40it/s] 71%|███████████████████████████████████████████████████████████████████████████▊ | 208/291 [01:37<01:38, 1.19s/it] [2024-01-29 22:01:12] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.27.mlp.down_proj.q_weight[0m", shape: (8192, 2752), dtype: uint32 71%|███████████████████████████████████████████████████████████████████████████▊ | 208/291 [01:38<01:38, 1.19s/it] [2024-01-29 22:01:12] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.27.mlp.down_proj.q_scale[0m", shape: (8192, 688), dtype: float16 71%|███████████████████████████████████████████████████████████████████████████▊ | 208/291 [01:38<01:38, 1.19s/it] 72%|████████████████████████████████████████████████████████████████████████████▏ | 209/291 [01:38<01:21, 1.00it/s] [2024-01-29 22:01:12] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.27.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 72%|████████████████████████████████████████████████████████████████████████████▏ | 209/291 [01:38<01:21, 1.00it/s] [2024-01-29 22:01:12] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.28.input_layernorm.weight[0m", shape: (8192,), dtype: float16 72%|████████████████████████████████████████████████████████████████████████████▏ | 209/291 [01:38<01:21, 1.00it/s] [2024-01-29 22:01:12] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.28.mlp.down_proj.q_weight[0m", shape: (8192, 2752), dtype: uint32 72%|████████████████████████████████████████████████████████████████████████████▏ | 209/291 [01:38<01:21, 1.00it/s] [2024-01-29 22:01:12] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.28.mlp.down_proj.q_scale[0m", shape: (8192, 688), dtype: float16 72%|████████████████████████████████████████████████████████████████████████████▏ | 209/291 [01:38<01:21, 1.00it/s] 73%|█████████████████████████████████████████████████████████████████████████████▏ | 212/291 [01:38<00:42, 1.84it/s] [2024-01-29 22:01:14] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.28.mlp.gate_up_proj.q_weight[0m", shape: (44032, 1024), dtype: uint32 73%|█████████████████████████████████████████████████████████████████████████████▏ | 212/291 [01:40<00:42, 1.84it/s] [2024-01-29 22:01:14] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.28.mlp.gate_up_proj.q_scale[0m", shape: (44032, 256), dtype: float16 73%|█████████████████████████████████████████████████████████████████████████████▏ | 212/291 [01:40<00:42, 1.84it/s] 73%|█████████████████████████████████████████████████████████████████████████████▌ | 213/291 [01:40<00:54, 1.42it/s] [2024-01-29 22:01:14] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.28.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 73%|█████████████████████████████████████████████████████████████████████████████▌ | 213/291 [01:40<00:54, 1.42it/s] [2024-01-29 22:01:14] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.28.self_attn.qkv_proj.q_weight[0m", shape: (10240, 1024), dtype: uint32 73%|█████████████████████████████████████████████████████████████████████████████▌ | 213/291 [01:40<00:54, 1.42it/s] [2024-01-29 22:01:14] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.28.self_attn.qkv_proj.q_scale[0m", shape: (10240, 256), dtype: float16 73%|█████████████████████████████████████████████████████████████████████████████▌ | 213/291 [01:40<00:54, 1.42it/s] 74%|██████████████████████████████████████████████████████████████████████████████▎ | 215/291 [01:40<00:37, 2.03it/s] [2024-01-29 22:01:14] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.28.self_attn.o_proj.q_weight[0m", shape: (8192, 1024), dtype: uint32 74%|██████████████████████████████████████████████████████████████████████████████▎ | 215/291 [01:40<00:37, 2.03it/s] [2024-01-29 22:01:14] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.28.self_attn.o_proj.q_scale[0m", shape: (8192, 256), dtype: float16 74%|██████████████████████████████████████████████████████████████████████████████▎ | 215/291 [01:40<00:37, 2.03it/s] 74%|██████████████████████████████████████████████████████████████████████████████▋ | 216/291 [01:40<00:32, 2.34it/s] [2024-01-29 22:01:14] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.29.input_layernorm.weight[0m", shape: (8192,), dtype: float16 74%|██████████████████████████████████████████████████████████████████████████████▋ | 216/291 [01:40<00:32, 2.34it/s] [2024-01-29 22:01:14] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.29.mlp.down_proj.q_weight[0m", shape: (8192, 2752), dtype: uint32 74%|██████████████████████████████████████████████████████████████████████████████▋ | 216/291 [01:41<00:32, 2.34it/s] [2024-01-29 22:01:14] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.29.mlp.down_proj.q_scale[0m", shape: (8192, 688), dtype: float16 74%|██████████████████████████████████████████████████████████████████████████████▋ | 216/291 [01:41<00:32, 2.34it/s] 75%|███████████████████████████████████████████████████████████████████████████████▍ | 218/291 [01:41<00:26, 2.78it/s] [2024-01-29 22:01:16] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.29.mlp.gate_up_proj.q_weight[0m", shape: (44032, 1024), dtype: uint32 75%|███████████████████████████████████████████████████████████████████████████████▍ | 218/291 [01:42<00:26, 2.78it/s] [2024-01-29 22:01:16] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.29.mlp.gate_up_proj.q_scale[0m", shape: (44032, 256), dtype: float16 75%|███████████████████████████████████████████████████████████████████████████████▍ | 218/291 [01:42<00:26, 2.78it/s] 75%|███████████████████████████████████████████████████████████████████████████████▊ | 219/291 [01:42<00:39, 1.84it/s] [2024-01-29 22:01:16] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.29.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 75%|███████████████████████████████████████████████████████████████████████████████▊ | 219/291 [01:42<00:39, 1.84it/s] [2024-01-29 22:01:16] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.29.self_attn.qkv_proj.q_weight[0m", shape: (10240, 1024), dtype: uint32 75%|███████████████████████████████████████████████████████████████████████████████▊ | 219/291 [01:42<00:39, 1.84it/s] [2024-01-29 22:01:16] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.29.self_attn.qkv_proj.q_scale[0m", shape: (10240, 256), dtype: float16 75%|███████████████████████████████████████████████████████████████████████████████▊ | 219/291 [01:42<00:39, 1.84it/s] 76%|████████████████████████████████████████████████████████████████████████████████▌ | 221/291 [01:42<00:27, 2.57it/s] [2024-01-29 22:01:16] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.29.self_attn.o_proj.q_weight[0m", shape: (8192, 1024), dtype: uint32 76%|████████████████████████████████████████████████████████████████████████████████▌ | 221/291 [01:42<00:27, 2.57it/s] [2024-01-29 22:01:16] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.29.self_attn.o_proj.q_scale[0m", shape: (8192, 256), dtype: float16 76%|████████████████████████████████████████████████████████████████████████████████▌ | 221/291 [01:42<00:27, 2.57it/s] 76%|████████████████████████████████████████████████████████████████████████████████▊ | 222/291 [01:42<00:23, 2.88it/s] [2024-01-29 22:01:16] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.30.input_layernorm.weight[0m", shape: (8192,), dtype: float16 76%|████████████████████████████████████████████████████████████████████████████████▊ | 222/291 [01:42<00:23, 2.88it/s] [2024-01-29 22:01:17] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.30.mlp.down_proj.q_weight[0m", shape: (8192, 2752), dtype: uint32 76%|████████████████████████████████████████████████████████████████████████████████▊ | 222/291 [01:43<00:23, 2.88it/s] [2024-01-29 22:01:17] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.30.mlp.down_proj.q_scale[0m", shape: (8192, 688), dtype: float16 76%|████████████████████████████████████████████████████████████████████████████████▊ | 222/291 [01:43<00:23, 2.88it/s] 77%|█████████████████████████████████████████████████████████████████████████████████▌ | 224/291 [01:43<00:21, 3.16it/s] [2024-01-29 22:01:18] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.30.mlp.gate_up_proj.q_weight[0m", shape: (44032, 1024), dtype: uint32 77%|█████████████████████████████████████████████████████████████████████████████████▌ | 224/291 [01:44<00:21, 3.16it/s] [2024-01-29 22:01:18] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.30.mlp.gate_up_proj.q_scale[0m", shape: (44032, 256), dtype: float16 77%|█████████████████████████████████████████████████████████████████████████████████▌ | 224/291 [01:44<00:21, 3.16it/s] 77%|█████████████████████████████████████████████████████████████████████████████████▉ | 225/291 [01:44<00:33, 1.97it/s] [2024-01-29 22:01:18] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.30.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 77%|█████████████████████████████████████████████████████████████████████████████████▉ | 225/291 [01:44<00:33, 1.97it/s] [2024-01-29 22:01:18] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.30.self_attn.qkv_proj.q_weight[0m", shape: (10240, 1024), dtype: uint32 77%|█████████████████████████████████████████████████████████████████████████████████▉ | 225/291 [01:44<00:33, 1.97it/s] [2024-01-29 22:01:18] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.30.self_attn.qkv_proj.q_scale[0m", shape: (10240, 256), dtype: float16 77%|█████████████████████████████████████████████████████████████████████████████████▉ | 225/291 [01:44<00:33, 1.97it/s] 78%|██████████████████████████████████████████████████████████████████████████████████▋ | 227/291 [01:44<00:23, 2.73it/s] [2024-01-29 22:01:18] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.30.self_attn.o_proj.q_weight[0m", shape: (8192, 1024), dtype: uint32 78%|██████████████████████████████████████████████████████████████████████████████████▋ | 227/291 [01:45<00:23, 2.73it/s] [2024-01-29 22:01:18] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.30.self_attn.o_proj.q_scale[0m", shape: (8192, 256), dtype: float16 78%|██████████████████████████████████████████████████████████████████████████████████▋ | 227/291 [01:45<00:23, 2.73it/s] 78%|███████████████████████████████████████████████████████████████████████████████████ | 228/291 [01:45<00:20, 3.04it/s] [2024-01-29 22:01:18] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.31.input_layernorm.weight[0m", shape: (8192,), dtype: float16 78%|███████████████████████████████████████████████████████████████████████████████████ | 228/291 [01:45<00:20, 3.04it/s] [2024-01-29 22:01:19] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.31.mlp.down_proj.q_weight[0m", shape: (8192, 2752), dtype: uint32 78%|███████████████████████████████████████████████████████████████████████████████████ | 228/291 [01:45<00:20, 3.04it/s] [2024-01-29 22:01:19] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.31.mlp.down_proj.q_scale[0m", shape: (8192, 688), dtype: float16 78%|███████████████████████████████████████████████████████████████████████████████████ | 228/291 [01:45<00:20, 3.04it/s] 79%|███████████████████████████████████████████████████████████████████████████████████▊ | 230/291 [01:45<00:18, 3.28it/s] [2024-01-29 22:01:20] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.31.mlp.gate_up_proj.q_weight[0m", shape: (44032, 1024), dtype: uint32 79%|███████████████████████████████████████████████████████████████████████████████████▊ | 230/291 [01:46<00:18, 3.28it/s] [2024-01-29 22:01:20] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.31.mlp.gate_up_proj.q_scale[0m", shape: (44032, 256), dtype: float16 79%|███████████████████████████████████████████████████████████████████████████████████▊ | 230/291 [01:46<00:18, 3.28it/s] 79%|████████████████████████████████████████████████████████████████████████████████████▏ | 231/291 [01:46<00:32, 1.87it/s] [2024-01-29 22:01:20] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.31.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 79%|████████████████████████████████████████████████████████████████████████████████████▏ | 231/291 [01:46<00:32, 1.87it/s] [2024-01-29 22:01:21] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.31.self_attn.qkv_proj.q_weight[0m", shape: (10240, 1024), dtype: uint32 79%|████████████████████████████████████████████████████████████████████████████████████▏ | 231/291 [01:47<00:32, 1.87it/s] [2024-01-29 22:01:21] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.31.self_attn.qkv_proj.q_scale[0m", shape: (10240, 256), dtype: float16 79%|████████████████████████████████████████████████████████████████████████████████████▏ | 231/291 [01:47<00:32, 1.87it/s] 80%|████████████████████████████████████████████████████████████████████████████████████▊ | 233/291 [01:47<00:22, 2.61it/s] [2024-01-29 22:01:21] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.31.self_attn.o_proj.q_weight[0m", shape: (8192, 1024), dtype: uint32 80%|████████████████████████████████████████████████████████████████████████████████████▊ | 233/291 [01:47<00:22, 2.61it/s] [2024-01-29 22:01:21] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.31.self_attn.o_proj.q_scale[0m", shape: (8192, 256), dtype: float16 80%|████████████████████████████████████████████████████████████████████████████████████▊ | 233/291 [01:47<00:22, 2.61it/s] 80%|█████████████████████████████████████████████████████████████████████████████████████▏ | 234/291 [01:47<00:19, 2.93it/s] [2024-01-29 22:01:21] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.32.input_layernorm.weight[0m", shape: (8192,), dtype: float16 80%|█████████████████████████████████████████████████████████████████████████████████████▏ | 234/291 [01:47<00:19, 2.93it/s] [2024-01-29 22:01:21] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.32.mlp.down_proj.q_weight[0m", shape: (8192, 2752), dtype: uint32 80%|█████████████████████████████████████████████████████████████████████████████████████▏ | 234/291 [01:47<00:19, 2.93it/s] [2024-01-29 22:01:21] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.32.mlp.down_proj.q_scale[0m", shape: (8192, 688), dtype: float16 80%|█████████████████████████████████████████████████████████████████████████████████████▏ | 234/291 [01:47<00:19, 2.93it/s] 81%|█████████████████████████████████████████████████████████████████████████████████████▉ | 236/291 [01:47<00:17, 3.19it/s] [2024-01-29 22:01:23] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.32.mlp.gate_up_proj.q_weight[0m", shape: (44032, 1024), dtype: uint32 81%|█████████████████████████████████████████████████████████████████████████████████████▉ | 236/291 [01:49<00:17, 3.19it/s] [2024-01-29 22:01:23] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.32.mlp.gate_up_proj.q_scale[0m", shape: (44032, 256), dtype: float16 81%|█████████████████████████████████████████████████████████████████████████████████████▉ | 236/291 [01:49<00:17, 3.19it/s] 81%|██████████████████████████████████████████████████████████████████████████████████████▎ | 237/291 [01:49<00:28, 1.87it/s] [2024-01-29 22:01:23] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.32.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 81%|██████████████████████████████████████████████████████████████████████████████████████▎ | 237/291 [01:49<00:28, 1.87it/s] [2024-01-29 22:01:23] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.32.self_attn.qkv_proj.q_weight[0m", shape: (10240, 1024), dtype: uint32 81%|██████████████████████████████████████████████████████████████████████████████████████▎ | 237/291 [01:49<00:28, 1.87it/s] [2024-01-29 22:01:23] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.32.self_attn.qkv_proj.q_scale[0m", shape: (10240, 256), dtype: float16 81%|██████████████████████████████████████████████████████████████████████████████████████▎ | 237/291 [01:49<00:28, 1.87it/s] 82%|███████████████████████████████████████████████████████████████████████████████████████ | 239/291 [01:49<00:19, 2.61it/s] [2024-01-29 22:01:23] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.32.self_attn.o_proj.q_weight[0m", shape: (8192, 1024), dtype: uint32 82%|███████████████████████████████████████████████████████████████████████████████████████ | 239/291 [01:49<00:19, 2.61it/s] [2024-01-29 22:01:23] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.32.self_attn.o_proj.q_scale[0m", shape: (8192, 256), dtype: float16 82%|███████████████████████████████████████████████████████████████████████████████████████ | 239/291 [01:49<00:19, 2.61it/s] 82%|███████████████████████████████████████████████████████████████████████████████████████▍ | 240/291 [01:49<00:17, 2.92it/s] [2024-01-29 22:01:23] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.33.input_layernorm.weight[0m", shape: (8192,), dtype: float16 82%|███████████████████████████████████████████████████████████████████████████████████████▍ | 240/291 [01:49<00:17, 2.92it/s] [2024-01-29 22:01:24] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.33.mlp.down_proj.q_weight[0m", shape: (8192, 2752), dtype: uint32 82%|███████████████████████████████████████████████████████████████████████████████████████▍ | 240/291 [01:50<00:17, 2.92it/s] [2024-01-29 22:01:24] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.33.mlp.down_proj.q_scale[0m", shape: (8192, 688), dtype: float16 82%|███████████████████████████████████████████████████████████████████████████████████████▍ | 240/291 [01:50<00:17, 2.92it/s] 83%|████████████████████████████████████████████████████████████████████████████████████████▏ | 242/291 [01:50<00:15, 3.19it/s] [2024-01-29 22:01:25] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.33.mlp.gate_up_proj.q_weight[0m", shape: (44032, 1024), dtype: uint32 83%|████████████████████████████████████████████████████████████████████████████████████████▏ | 242/291 [01:51<00:15, 3.19it/s] [2024-01-29 22:01:25] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.33.mlp.gate_up_proj.q_scale[0m", shape: (44032, 256), dtype: float16 83%|████████████████████████████████████████████████████████████████████████████████████████▏ | 242/291 [01:51<00:15, 3.19it/s] 84%|████████████████████████████████████████████████████████████████████████████████████████▌ | 243/291 [01:51<00:25, 1.89it/s] [2024-01-29 22:01:25] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.33.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 84%|████████████████████████████████████████████████████████████████████████████████████████▌ | 243/291 [01:51<00:25, 1.89it/s] [2024-01-29 22:01:25] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.33.self_attn.qkv_proj.q_weight[0m", shape: (10240, 1024), dtype: uint32 84%|████████████████████████████████████████████████████████████████████████████████████████▌ | 243/291 [01:51<00:25, 1.89it/s] [2024-01-29 22:01:25] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.33.self_attn.qkv_proj.q_scale[0m", shape: (10240, 256), dtype: float16 84%|████████████████████████████████████████████████████████████████████████████████████████▌ | 243/291 [01:51<00:25, 1.89it/s] 84%|█████████████████████████████████████████████████████████████████████████████████████████▏ | 245/291 [01:51<00:17, 2.63it/s] [2024-01-29 22:01:25] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.33.self_attn.o_proj.q_weight[0m", shape: (8192, 1024), dtype: uint32 84%|█████████████████████████████████████████████████████████████████████████████████████████▏ | 245/291 [01:52<00:17, 2.63it/s] [2024-01-29 22:01:25] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.33.self_attn.o_proj.q_scale[0m", shape: (8192, 256), dtype: float16 84%|█████████████████████████████████████████████████████████████████████████████████████████▏ | 245/291 [01:52<00:17, 2.63it/s] 85%|█████████████████████████████████████████████████████████████████████████████████████████▌ | 246/291 [01:52<00:15, 2.95it/s] [2024-01-29 22:01:27] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.34.mlp.gate_up_proj.q_weight[0m", shape: (44032, 1024), dtype: uint32 85%|█████████████████████████████████████████████████████████████████████████████████████████▌ | 246/291 [01:53<00:15, 2.95it/s] [2024-01-29 22:01:27] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.34.mlp.gate_up_proj.q_scale[0m", shape: (44032, 256), dtype: float16 85%|█████████████████████████████████████████████████████████████████████████████████████████▌ | 246/291 [01:53<00:15, 2.95it/s] 85%|█████████████████████████████████████████████████████████████████████████████████████████▉ | 247/291 [01:53<00:24, 1.77it/s] [2024-01-29 22:01:27] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.34.self_attn.qkv_proj.q_weight[0m", shape: (10240, 1024), dtype: uint32 85%|█████████████████████████████████████████████████████████████████████████████████████████▉ | 247/291 [01:53<00:24, 1.77it/s] [2024-01-29 22:01:27] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.34.self_attn.qkv_proj.q_scale[0m", shape: (10240, 256), dtype: float16 85%|█████████████████████████████████████████████████████████████████████████████████████████▉ | 247/291 [01:53<00:24, 1.77it/s] 85%|██████████████████████████████████████████████████████████████████████████████████████████▎ | 248/291 [01:53<00:21, 2.03it/s] [2024-01-29 22:01:27] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.34.self_attn.o_proj.q_weight[0m", shape: (8192, 1024), dtype: uint32 85%|██████████████████████████████████████████████████████████████████████████████████████████▎ | 248/291 [01:53<00:21, 2.03it/s] [2024-01-29 22:01:27] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.34.self_attn.o_proj.q_scale[0m", shape: (8192, 256), dtype: float16 85%|██████████████████████████████████████████████████████████████████████████████████████████▎ | 248/291 [01:53<00:21, 2.03it/s] 86%|██████████████████████████████████████████████████████████████████████████████████████████▋ | 249/291 [01:53<00:17, 2.43it/s] [2024-01-29 22:01:27] INFO huggingface_loader.py:181: Unloading HF weight file: /var/folders/50/mzqbqxqj5fddcby2mg3h334c0000gp/T/tmphxbtotbe/repo/pytorch_model-00005-of-00007.bin 86%|██████████████████████████████████████████████████████████████████████████████████████████▋ | 249/291 [01:53<00:17, 2.43it/s] [2024-01-29 22:01:27] INFO huggingface_loader.py:169: Loading HF parameters from: /var/folders/50/mzqbqxqj5fddcby2mg3h334c0000gp/T/tmphxbtotbe/repo/pytorch_model-00006-of-00007.bin 86%|██████████████████████████████████████████████████████████████████████████████████████████▋ | 249/291 [01:54<00:17, 2.43it/s] [2024-01-29 22:01:30] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.34.input_layernorm.weight[0m", shape: (8192,), dtype: float16 86%|██████████████████████████████████████████████████████████████████████████████████████████▋ | 249/291 [01:57<00:17, 2.43it/s] 86%|███████████████████████████████████████████████████████████████████████████████████████████ | 250/291 [01:57<00:49, 1.20s/it] [2024-01-29 22:01:31] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.34.mlp.down_proj.q_weight[0m", shape: (8192, 2752), dtype: uint32 86%|███████████████████████████████████████████████████████████████████████████████████████████ | 250/291 [01:57<00:49, 1.20s/it] [2024-01-29 22:01:31] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.34.mlp.down_proj.q_scale[0m", shape: (8192, 688), dtype: float16 86%|███████████████████████████████████████████████████████████████████████████████████████████ | 250/291 [01:57<00:49, 1.20s/it] 86%|███████████████████████████████████████████████████████████████████████████████████████████▍ | 251/291 [01:57<00:40, 1.00s/it] [2024-01-29 22:01:31] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.34.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 86%|███████████████████████████████████████████████████████████████████████████████████████████▍ | 251/291 [01:57<00:40, 1.00s/it] [2024-01-29 22:01:31] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.35.input_layernorm.weight[0m", shape: (8192,), dtype: float16 86%|███████████████████████████████████████████████████████████████████████████████████████████▍ | 251/291 [01:57<00:40, 1.00s/it] [2024-01-29 22:01:31] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.35.mlp.down_proj.q_weight[0m", shape: (8192, 2752), dtype: uint32 86%|███████████████████████████████████████████████████████████████████████████████████████████▍ | 251/291 [01:58<00:40, 1.00s/it] [2024-01-29 22:01:31] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.35.mlp.down_proj.q_scale[0m", shape: (8192, 688), dtype: float16 86%|███████████████████████████████████████████████████████████████████████████████████████████▍ | 251/291 [01:58<00:40, 1.00s/it] 87%|████████████████████████████████████████████████████████████████████████████████████████████▌ | 254/291 [01:58<00:20, 1.83it/s] [2024-01-29 22:01:33] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.35.mlp.gate_up_proj.q_weight[0m", shape: (44032, 1024), dtype: uint32 87%|████████████████████████████████████████████████████████████████████████████████████████████▌ | 254/291 [01:59<00:20, 1.83it/s] [2024-01-29 22:01:33] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.35.mlp.gate_up_proj.q_scale[0m", shape: (44032, 256), dtype: float16 87%|████████████████████████████████████████████████████████████████████████████████████████████▌ | 254/291 [01:59<00:20, 1.83it/s] 88%|████████████████████████████████████████████████████████████████████████████████████████████▉ | 255/291 [01:59<00:25, 1.41it/s] [2024-01-29 22:01:33] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.35.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 88%|████████████████████████████████████████████████████████████████████████████████████████████▉ | 255/291 [01:59<00:25, 1.41it/s] [2024-01-29 22:01:33] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.35.self_attn.qkv_proj.q_weight[0m", shape: (10240, 1024), dtype: uint32 88%|████████████████████████████████████████████████████████████████████████████████████████████▉ | 255/291 [01:59<00:25, 1.41it/s] [2024-01-29 22:01:33] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.35.self_attn.qkv_proj.q_scale[0m", shape: (10240, 256), dtype: float16 88%|████████████████████████████████████████████████████████████████████████████████████████████▉ | 255/291 [01:59<00:25, 1.41it/s] 88%|█████████████████████████████████████████████████████████████████████████████████████████████▌ | 257/291 [01:59<00:16, 2.02it/s] [2024-01-29 22:01:33] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.35.self_attn.o_proj.q_weight[0m", shape: (8192, 1024), dtype: uint32 88%|█████████████████████████████████████████████████████████████████████████████████████████████▌ | 257/291 [01:59<00:16, 2.02it/s] [2024-01-29 22:01:33] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.35.self_attn.o_proj.q_scale[0m", shape: (8192, 256), dtype: float16 88%|█████████████████████████████████████████████████████████████████████████████████████████████▌ | 257/291 [01:59<00:16, 2.02it/s] 89%|█████████████████████████████████████████████████████████████████████████████████████████████▉ | 258/291 [01:59<00:14, 2.32it/s] [2024-01-29 22:01:33] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.36.input_layernorm.weight[0m", shape: (8192,), dtype: float16 89%|█████████████████████████████████████████████████████████████████████████████████████████████▉ | 258/291 [01:59<00:14, 2.32it/s] [2024-01-29 22:01:34] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.36.mlp.down_proj.q_weight[0m", shape: (8192, 2752), dtype: uint32 89%|█████████████████████████████████████████████████████████████████████████████████████████████▉ | 258/291 [02:00<00:14, 2.32it/s] [2024-01-29 22:01:34] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.36.mlp.down_proj.q_scale[0m", shape: (8192, 688), dtype: float16 89%|█████████████████████████████████████████████████████████████████████████████████████████████▉ | 258/291 [02:00<00:14, 2.32it/s] 89%|██████████████████████████████████████████████████████████████████████████████████████████████▋ | 260/291 [02:00<00:11, 2.77it/s] [2024-01-29 22:01:37] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.36.mlp.gate_up_proj.q_weight[0m", shape: (44032, 1024), dtype: uint32 89%|██████████████████████████████████████████████████████████████████████████████████████████████▋ | 260/291 [02:03<00:11, 2.77it/s] [2024-01-29 22:01:37] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.36.mlp.gate_up_proj.q_scale[0m", shape: (44032, 256), dtype: float16 89%|██████████████████████████████████████████████████████████████████████████████████████████████▋ | 260/291 [02:03<00:11, 2.77it/s] 90%|███████████████████████████████████████████████████████████████████████████████████████████████ | 261/291 [02:03<00:28, 1.05it/s] [2024-01-29 22:01:37] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.36.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 90%|███████████████████████████████████████████████████████████████████████████████████████████████ | 261/291 [02:03<00:28, 1.05it/s] [2024-01-29 22:01:37] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.36.self_attn.qkv_proj.q_weight[0m", shape: (10240, 1024), dtype: uint32 90%|███████████████████████████████████████████████████████████████████████████████████████████████ | 261/291 [02:03<00:28, 1.05it/s] [2024-01-29 22:01:37] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.36.self_attn.qkv_proj.q_scale[0m", shape: (10240, 256), dtype: float16 90%|███████████████████████████████████████████████████████████████████████████████████████████████ | 261/291 [02:03<00:28, 1.05it/s] 90%|███████████████████████████████████████████████████████████████████████████████████████████████▊ | 263/291 [02:03<00:18, 1.55it/s] [2024-01-29 22:01:37] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.36.self_attn.o_proj.q_weight[0m", shape: (8192, 1024), dtype: uint32 90%|███████████████████████████████████████████████████████████████████████████████████████████████▊ | 263/291 [02:04<00:18, 1.55it/s] [2024-01-29 22:01:37] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.36.self_attn.o_proj.q_scale[0m", shape: (8192, 256), dtype: float16 90%|███████████████████████████████████████████████████████████████████████████████████████████████▊ | 263/291 [02:04<00:18, 1.55it/s] 91%|████████████████████████████████████████████████████████████████████████████████████████████████▏ | 264/291 [02:04<00:14, 1.83it/s] [2024-01-29 22:01:37] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.37.input_layernorm.weight[0m", shape: (8192,), dtype: float16 91%|████████████████████████████████████████████████████████████████████████████████████████████████▏ | 264/291 [02:04<00:14, 1.83it/s] [2024-01-29 22:01:38] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.37.mlp.down_proj.q_weight[0m", shape: (8192, 2752), dtype: uint32 91%|████████████████████████████████████████████████████████████████████████████████████████████████▏ | 264/291 [02:04<00:14, 1.83it/s] [2024-01-29 22:01:38] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.37.mlp.down_proj.q_scale[0m", shape: (8192, 688), dtype: float16 91%|████████████████████████████████████████████████████████████████████████████████████████████████▏ | 264/291 [02:04<00:14, 1.83it/s] 91%|████████████████████████████████████████████████████████████████████████████████████████████████▉ | 266/291 [02:04<00:11, 2.27it/s] [2024-01-29 22:01:39] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.37.mlp.gate_up_proj.q_weight[0m", shape: (44032, 1024), dtype: uint32 91%|████████████████████████████████████████████████████████████████████████████████████████████████▉ | 266/291 [02:05<00:11, 2.27it/s] [2024-01-29 22:01:39] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.37.mlp.gate_up_proj.q_scale[0m", shape: (44032, 256), dtype: float16 91%|████████████████████████████████████████████████████████████████████████████████████████████████▉ | 266/291 [02:05<00:11, 2.27it/s] 92%|█████████████████████████████████████████████████████████████████████████████████████████████████▎ | 267/291 [02:05<00:15, 1.59it/s] [2024-01-29 22:01:39] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.37.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 92%|█████████████████████████████████████████████████████████████████████████████████████████████████▎ | 267/291 [02:05<00:15, 1.59it/s] [2024-01-29 22:01:39] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.37.self_attn.qkv_proj.q_weight[0m", shape: (10240, 1024), dtype: uint32 92%|█████████████████████████████████████████████████████████████████████████████████████████████████▎ | 267/291 [02:06<00:15, 1.59it/s] [2024-01-29 22:01:39] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.37.self_attn.qkv_proj.q_scale[0m", shape: (10240, 256), dtype: float16 92%|█████████████████████████████████████████████████████████████████████████████████████████████████▎ | 267/291 [02:06<00:15, 1.59it/s] 92%|█████████████████████████████████████████████████████████████████████████████████████████████████▉ | 269/291 [02:06<00:09, 2.27it/s] [2024-01-29 22:01:40] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.37.self_attn.o_proj.q_weight[0m", shape: (8192, 1024), dtype: uint32 92%|█████████████████████████████████████████████████████████████████████████████████████████████████▉ | 269/291 [02:06<00:09, 2.27it/s] [2024-01-29 22:01:40] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.37.self_attn.o_proj.q_scale[0m", shape: (8192, 256), dtype: float16 92%|█████████████████████████████████████████████████████████████████████████████████████████████████▉ | 269/291 [02:06<00:09, 2.27it/s] 93%|██████████████████████████████████████████████████████████████████████████████████████████████████▎ | 270/291 [02:06<00:08, 2.58it/s] [2024-01-29 22:01:40] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.38.input_layernorm.weight[0m", shape: (8192,), dtype: float16 93%|██████████████████████████████████████████████████████████████████████████████████████████████████▎ | 270/291 [02:06<00:08, 2.58it/s] [2024-01-29 22:01:40] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.38.mlp.down_proj.q_weight[0m", shape: (8192, 2752), dtype: uint32 93%|██████████████████████████████████████████████████████████████████████████████████████████████████▎ | 270/291 [02:06<00:08, 2.58it/s] [2024-01-29 22:01:40] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.38.mlp.down_proj.q_scale[0m", shape: (8192, 688), dtype: float16 93%|██████████████████████████████████████████████████████████████████████████████████████████████████▎ | 270/291 [02:06<00:08, 2.58it/s] 93%|███████████████████████████████████████████████████████████████████████████████████████████████████ | 272/291 [02:06<00:06, 2.93it/s] [2024-01-29 22:01:42] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.38.mlp.gate_up_proj.q_weight[0m", shape: (44032, 1024), dtype: uint32 93%|███████████████████████████████████████████████████████████████████████████████████████████████████ | 272/291 [02:08<00:06, 2.93it/s] [2024-01-29 22:01:42] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.38.mlp.gate_up_proj.q_scale[0m", shape: (44032, 256), dtype: float16 93%|███████████████████████████████████████████████████████████████████████████████████████████████████ | 272/291 [02:08<00:06, 2.93it/s] 94%|███████████████████████████████████████████████████████████████████████████████████████████████████▍ | 273/291 [02:08<00:09, 1.82it/s] [2024-01-29 22:01:42] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.38.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 94%|███████████████████████████████████████████████████████████████████████████████████████████████████▍ | 273/291 [02:08<00:09, 1.82it/s] [2024-01-29 22:01:42] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.38.self_attn.qkv_proj.q_weight[0m", shape: (10240, 1024), dtype: uint32 94%|███████████████████████████████████████████████████████████████████████████████████████████████████▍ | 273/291 [02:08<00:09, 1.82it/s] [2024-01-29 22:01:42] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.38.self_attn.qkv_proj.q_scale[0m", shape: (10240, 256), dtype: float16 94%|███████████████████████████████████████████████████████████████████████████████████████████████████▍ | 273/291 [02:08<00:09, 1.82it/s] 95%|████████████████████████████████████████████████████████████████████████████████████████████████████▏ | 275/291 [02:08<00:06, 2.54it/s] [2024-01-29 22:01:42] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.38.self_attn.o_proj.q_weight[0m", shape: (8192, 1024), dtype: uint32 95%|████████████████████████████████████████████████████████████████████████████████████████████████████▏ | 275/291 [02:08<00:06, 2.54it/s] [2024-01-29 22:01:42] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.38.self_attn.o_proj.q_scale[0m", shape: (8192, 256), dtype: float16 95%|████████████████████████████████████████████████████████████████████████████████████████████████████▏ | 275/291 [02:08<00:06, 2.54it/s] 95%|████████████████████████████████████████████████████████████████████████████████████████████████████▌ | 276/291 [02:08<00:05, 2.86it/s] [2024-01-29 22:01:42] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.39.input_layernorm.weight[0m", shape: (8192,), dtype: float16 95%|████████████████████████████████████████████████████████████████████████████████████████████████████▌ | 276/291 [02:08<00:05, 2.86it/s] [2024-01-29 22:01:43] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.39.mlp.down_proj.q_weight[0m", shape: (8192, 2752), dtype: uint32 95%|████████████████████████████████████████████████████████████████████████████████████████████████████▌ | 276/291 [02:09<00:05, 2.86it/s] [2024-01-29 22:01:43] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.39.mlp.down_proj.q_scale[0m", shape: (8192, 688), dtype: float16 95%|████████████████████████████████████████████████████████████████████████████████████████████████████▌ | 276/291 [02:09<00:05, 2.86it/s] 96%|█████████████████████████████████████████████████████████████████████████████████████████████████████▎ | 278/291 [02:09<00:04, 3.14it/s] [2024-01-29 22:01:44] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.39.mlp.gate_up_proj.q_weight[0m", shape: (44032, 1024), dtype: uint32 96%|█████████████████████████████████████████████████████████████████████████████████████████████████████▎ | 278/291 [02:10<00:04, 3.14it/s] [2024-01-29 22:01:44] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.39.mlp.gate_up_proj.q_scale[0m", shape: (44032, 256), dtype: float16 96%|█████████████████████████████████████████████████████████████████████████████████████████████████████▎ | 278/291 [02:10<00:04, 3.14it/s] 96%|█████████████████████████████████████████████████████████████████████████████████████████████████████▋ | 279/291 [02:10<00:06, 1.88it/s] [2024-01-29 22:01:44] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.39.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 96%|█████████████████████████████████████████████████████████████████████████████████████████████████████▋ | 279/291 [02:10<00:06, 1.88it/s] [2024-01-29 22:01:44] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.39.self_attn.qkv_proj.q_weight[0m", shape: (10240, 1024), dtype: uint32 96%|█████████████████████████████████████████████████████████████████████████████████████████████████████▋ | 279/291 [02:10<00:06, 1.88it/s] [2024-01-29 22:01:44] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.39.self_attn.qkv_proj.q_scale[0m", shape: (10240, 256), dtype: float16 96%|█████████████████████████████████████████████████████████████████████████████████████████████████████▋ | 279/291 [02:10<00:06, 1.88it/s] 97%|██████████████████████████████████████████████████████████████████████████████████████████████████████▎ | 281/291 [02:10<00:03, 2.62it/s] [2024-01-29 22:01:44] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.39.self_attn.o_proj.q_weight[0m", shape: (8192, 1024), dtype: uint32 97%|██████████████████████████████████████████████████████████████████████████████████████████████████████▎ | 281/291 [02:10<00:03, 2.62it/s] [2024-01-29 22:01:44] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.39.self_attn.o_proj.q_scale[0m", shape: (8192, 256), dtype: float16 97%|██████████████████████████████████████████████████████████████████████████████████████████████████████▎ | 281/291 [02:10<00:03, 2.62it/s] 97%|██████████████████████████████████████████████████████████████████████████████████████████████████████▋ | 282/291 [02:10<00:03, 2.93it/s] [2024-01-29 22:01:44] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.40.input_layernorm.weight[0m", shape: (8192,), dtype: float16 97%|██████████████████████████████████████████████████████████████████████████████████████████████████████▋ | 282/291 [02:10<00:03, 2.93it/s] [2024-01-29 22:01:45] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.40.mlp.down_proj.q_weight[0m", shape: (8192, 2752), dtype: uint32 97%|██████████████████████████████████████████████████████████████████████████████████████████████████████▋ | 282/291 [02:11<00:03, 2.93it/s] [2024-01-29 22:01:45] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.40.mlp.down_proj.q_scale[0m", shape: (8192, 688), dtype: float16 97%|██████████████████████████████████████████████████████████████████████████████████████████████████████▋ | 282/291 [02:11<00:03, 2.93it/s] 98%|███████████████████████████████████████████████████████████████████████████████████████████████████████▍ | 284/291 [02:11<00:02, 3.19it/s] [2024-01-29 22:01:46] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.40.mlp.gate_up_proj.q_weight[0m", shape: (44032, 1024), dtype: uint32 98%|███████████████████████████████████████████████████████████████████████████████████████████████████████▍ | 284/291 [02:12<00:02, 3.19it/s] [2024-01-29 22:01:46] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.40.mlp.gate_up_proj.q_scale[0m", shape: (44032, 256), dtype: float16 98%|███████████████████████████████████████████████████████████████████████████████████████████████████████▍ | 284/291 [02:12<00:02, 3.19it/s] 98%|███████████████████████████████████████████████████████████████████████████████████████████████████████▊ | 285/291 [02:12<00:03, 1.88it/s] [2024-01-29 22:01:46] INFO huggingface_loader.py:129: [Not quantized] Parameter: "[1mmodel.layers.40.post_attention_layernorm.weight[0m", shape: (8192,), dtype: float16 98%|███████████████████████████████████████████████████████████████████████████████████████████████████████▊ | 285/291 [02:12<00:03, 1.88it/s] [2024-01-29 22:01:46] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.40.self_attn.qkv_proj.q_weight[0m", shape: (10240, 1024), dtype: uint32 98%|███████████████████████████████████████████████████████████████████████████████████████████████████████▊ | 285/291 [02:13<00:03, 1.88it/s] [2024-01-29 22:01:46] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.40.self_attn.qkv_proj.q_scale[0m", shape: (10240, 256), dtype: float16 98%|███████████████████████████████████████████████████████████████████████████████████████████████████████▊ | 285/291 [02:13<00:03, 1.88it/s] 99%|████████████████████████████████████████████████████████████████████████████████████████████████████████▌ | 287/291 [02:13<00:01, 2.61it/s] [2024-01-29 22:01:47] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.40.self_attn.o_proj.q_weight[0m", shape: (8192, 1024), dtype: uint32 99%|████████████████████████████████████████████████████████████████████████████████████████████████████████▌ | 287/291 [02:13<00:01, 2.61it/s] [2024-01-29 22:01:47] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.40.self_attn.o_proj.q_scale[0m", shape: (8192, 256), dtype: float16 99%|████████████████████████████████████████████████████████████████████████████████████████████████████████▌ | 287/291 [02:13<00:01, 2.61it/s] 99%|████████████████████████████████████████████████████████████████████████████████████████████████████████▉ | 288/291 [02:13<00:01, 2.93it/s] [2024-01-29 22:01:48] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.41.mlp.gate_up_proj.q_weight[0m", shape: (44032, 1024), dtype: uint32 99%|████████████████████████████████████████████████████████████████████████████████████████████████████████▉ | 288/291 [02:14<00:01, 2.93it/s] [2024-01-29 22:01:48] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.41.mlp.gate_up_proj.q_scale[0m", shape: (44032, 256), dtype: float16 99%|████████████████████████████████████████████████████████████████████████████████████████████████████████▉ | 288/291 [02:14<00:01, 2.93it/s] 99%|█████████████████████████████████████████████████████████████████████████████████████████████████████████▎| 289/291 [02:14<00:01, 1.77it/s] [2024-01-29 22:01:48] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.41.self_attn.qkv_proj.q_weight[0m", shape: (10240, 1024), dtype: uint32 99%|█████████████████████████████████████████████████████████████████████████████████████████████████████████▎| 289/291 [02:14<00:01, 1.77it/s] [2024-01-29 22:01:48] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.41.self_attn.qkv_proj.q_scale[0m", shape: (10240, 256), dtype: float16 99%|█████████████████████████████████████████████████████████████████████████████████████████████████████████▎| 289/291 [02:14<00:01, 1.77it/s] 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████▋| 290/291 [02:14<00:00, 2.03it/s] [2024-01-29 22:01:48] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.41.self_attn.o_proj.q_weight[0m", shape: (8192, 1024), dtype: uint32 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████▋| 290/291 [02:15<00:00, 2.03it/s] [2024-01-29 22:01:48] INFO huggingface_loader.py:121: [Quantized] Parameter: "[1mmodel.layers.41.self_attn.o_proj.q_scale[0m", shape: (8192, 256), dtype: float16 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████▋| 290/291 [02:15<00:00, 2.03it/s] 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 291/291 [02:15<00:00, 2.43it/s] 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 291/291 [02:15<00:00, 2.15it/s] [2024-01-29 22:01:48] INFO huggingface_loader.py:181: Unloading HF weight file: /var/folders/50/mzqbqxqj5fddcby2mg3h334c0000gp/T/tmphxbtotbe/repo/pytorch_model-00006-of-00007.bin [2024-01-29 22:01:49] INFO stats.py:76: [92mTime usage[0m: HF loading: 23.645 sec; Pre-quantization mapping: 97.650 sec; Quantization: 2.004 sec [2024-01-29 22:01:49] INFO stats.py:90: [92mRAM usage[0m: Peak RAM: 18.352 GB. Total bytes loaded from disk: 125.706 GB [2024-01-29 22:01:49] INFO convert_weight.py:121: [92mParameter size[0m after quantization: 17.678 GB [2024-01-29 22:01:49] INFO convert_weight.py:126: [92mTotal parameters[0m: 33,743,970,304 [2024-01-29 22:01:49] INFO convert_weight.py:127: [92mBits per parameter[0m: 4.500 Start storing to cache /var/folders/50/mzqbqxqj5fddcby2mg3h334c0000gp/T/tmpe1uh65a9 [0001/0485] saving lm_head.q_weight [0002/0485] saving lm_head.q_scale [0003/0485] saving model.layers.41.input_layernorm.weight [0004/0485] saving model.layers.41.mlp.down_proj.q_weight [0005/0485] saving model.layers.41.mlp.down_proj.q_scale [0006/0485] saving model.layers.41.post_attention_layernorm.weight [0007/0485] saving model.layers.42.input_layernorm.weight [0008/0485] saving model.layers.42.mlp.down_proj.q_weight [0009/0485] saving model.layers.42.mlp.down_proj.q_scale [0010/0485] saving model.layers.42.mlp.gate_up_proj.q_weight [0011/0485] saving model.layers.42.mlp.gate_up_proj.q_scale [0012/0485] saving model.layers.42.post_attention_layernorm.weight [0013/0485] saving model.layers.42.self_attn.qkv_proj.q_weight [0014/0485] saving model.layers.42.self_attn.qkv_proj.q_scale [0015/0485] saving model.layers.42.self_attn.o_proj.q_weight [0016/0485] saving model.layers.42.self_attn.o_proj.q_scale [0017/0485] saving model.layers.43.input_layernorm.weight [0018/0485] saving model.layers.43.mlp.down_proj.q_weight [0019/0485] saving model.layers.43.mlp.down_proj.q_scale [0020/0485] saving model.layers.43.mlp.gate_up_proj.q_weight [0021/0485] saving model.layers.43.mlp.gate_up_proj.q_scale [0022/0485] saving model.layers.43.post_attention_layernorm.weight [0023/0485] saving model.layers.43.self_attn.qkv_proj.q_weight [0024/0485] saving model.layers.43.self_attn.qkv_proj.q_scale [0025/0485] saving model.layers.43.self_attn.o_proj.q_weight [0026/0485] saving model.layers.43.self_attn.o_proj.q_scale [0027/0485] saving model.layers.44.input_layernorm.weight [0028/0485] saving model.layers.44.mlp.down_proj.q_weight [0029/0485] saving model.layers.44.mlp.down_proj.q_scale [0030/0485] saving model.layers.44.mlp.gate_up_proj.q_weight [0031/0485] saving model.layers.44.mlp.gate_up_proj.q_scale [0032/0485] saving model.layers.44.post_attention_layernorm.weight [0033/0485] saving model.layers.44.self_attn.qkv_proj.q_weight [0034/0485] saving model.layers.44.self_attn.qkv_proj.q_scale [0035/0485] saving model.layers.44.self_attn.o_proj.q_weight [0036/0485] saving model.layers.44.self_attn.o_proj.q_scale [0037/0485] saving model.layers.45.input_layernorm.weight [0038/0485] saving model.layers.45.mlp.down_proj.q_weight [0039/0485] saving model.layers.45.mlp.down_proj.q_scale [0040/0485] saving model.layers.45.mlp.gate_up_proj.q_weight [0041/0485] saving model.layers.45.mlp.gate_up_proj.q_scale [0042/0485] saving model.layers.45.post_attention_layernorm.weight [0043/0485] saving model.layers.45.self_attn.qkv_proj.q_weight [0044/0485] saving model.layers.45.self_attn.qkv_proj.q_scale [0045/0485] saving model.layers.45.self_attn.o_proj.q_weight [0046/0485] saving model.layers.45.self_attn.o_proj.q_scale [0047/0485] saving model.layers.46.input_layernorm.weight [0048/0485] saving model.layers.46.mlp.down_proj.q_weight [0049/0485] saving model.layers.46.mlp.down_proj.q_scale [0050/0485] saving model.layers.46.mlp.gate_up_proj.q_weight [0051/0485] saving model.layers.46.mlp.gate_up_proj.q_scale [0052/0485] saving model.layers.46.post_attention_layernorm.weight [0053/0485] saving model.layers.46.self_attn.qkv_proj.q_weight [0054/0485] saving model.layers.46.self_attn.qkv_proj.q_scale [0055/0485] saving model.layers.46.self_attn.o_proj.q_weight [0056/0485] saving model.layers.46.self_attn.o_proj.q_scale [0057/0485] saving model.layers.47.input_layernorm.weight [0058/0485] saving model.layers.47.mlp.down_proj.q_weight [0059/0485] saving model.layers.47.mlp.down_proj.q_scale [0060/0485] saving model.layers.47.mlp.gate_up_proj.q_weight [0061/0485] saving model.layers.47.mlp.gate_up_proj.q_scale [0062/0485] saving model.layers.47.post_attention_layernorm.weight [0063/0485] saving model.layers.47.self_attn.qkv_proj.q_weight [0064/0485] saving model.layers.47.self_attn.qkv_proj.q_scale [0065/0485] saving model.layers.47.self_attn.o_proj.q_weight [0066/0485] saving model.layers.47.self_attn.o_proj.q_scale [0067/0485] saving model.norm.weight [0068/0485] saving model.embed_tokens.q_weight [0069/0485] saving model.embed_tokens.q_scale [0070/0485] saving model.layers.0.input_layernorm.weight [0071/0485] saving model.layers.0.mlp.down_proj.q_weight [0072/0485] saving model.layers.0.mlp.down_proj.q_scale [0073/0485] saving model.layers.0.mlp.gate_up_proj.q_weight [0074/0485] saving model.layers.0.mlp.gate_up_proj.q_scale [0075/0485] saving model.layers.0.post_attention_layernorm.weight [0076/0485] saving model.layers.0.self_attn.qkv_proj.q_weight [0077/0485] saving model.layers.0.self_attn.qkv_proj.q_scale [0078/0485] saving model.layers.0.self_attn.o_proj.q_weight [0079/0485] saving model.layers.0.self_attn.o_proj.q_scale [0080/0485] saving model.layers.1.input_layernorm.weight [0081/0485] saving model.layers.1.mlp.down_proj.q_weight [0082/0485] saving model.layers.1.mlp.down_proj.q_scale [0083/0485] saving model.layers.1.mlp.gate_up_proj.q_weight [0084/0485] saving model.layers.1.mlp.gate_up_proj.q_scale [0085/0485] saving model.layers.1.post_attention_layernorm.weight [0086/0485] saving model.layers.1.self_attn.qkv_proj.q_weight [0087/0485] saving model.layers.1.self_attn.qkv_proj.q_scale [0088/0485] saving model.layers.1.self_attn.o_proj.q_weight [0089/0485] saving model.layers.1.self_attn.o_proj.q_scale [0090/0485] saving model.layers.2.input_layernorm.weight [0091/0485] saving model.layers.2.mlp.down_proj.q_weight [0092/0485] saving model.layers.2.mlp.down_proj.q_scale [0093/0485] saving model.layers.2.mlp.gate_up_proj.q_weight [0094/0485] saving model.layers.2.mlp.gate_up_proj.q_scale [0095/0485] saving model.layers.2.post_attention_layernorm.weight [0096/0485] saving model.layers.2.self_attn.qkv_proj.q_weight [0097/0485] saving model.layers.2.self_attn.qkv_proj.q_scale [0098/0485] saving model.layers.2.self_attn.o_proj.q_weight [0099/0485] saving model.layers.2.self_attn.o_proj.q_scale [0100/0485] saving model.layers.3.input_layernorm.weight [0101/0485] saving model.layers.3.mlp.down_proj.q_weight [0102/0485] saving model.layers.3.mlp.down_proj.q_scale [0103/0485] saving model.layers.3.mlp.gate_up_proj.q_weight [0104/0485] saving model.layers.3.mlp.gate_up_proj.q_scale [0105/0485] saving model.layers.3.post_attention_layernorm.weight [0106/0485] saving model.layers.3.self_attn.qkv_proj.q_weight [0107/0485] saving model.layers.3.self_attn.qkv_proj.q_scale [0108/0485] saving model.layers.3.self_attn.o_proj.q_weight [0109/0485] saving model.layers.3.self_attn.o_proj.q_scale [0110/0485] saving model.layers.4.input_layernorm.weight [0111/0485] saving model.layers.4.mlp.down_proj.q_weight [0112/0485] saving model.layers.4.mlp.down_proj.q_scale [0113/0485] saving model.layers.4.mlp.gate_up_proj.q_weight [0114/0485] saving model.layers.4.mlp.gate_up_proj.q_scale [0115/0485] saving model.layers.4.post_attention_layernorm.weight [0116/0485] saving model.layers.4.self_attn.qkv_proj.q_weight [0117/0485] saving model.layers.4.self_attn.qkv_proj.q_scale [0118/0485] saving model.layers.4.self_attn.o_proj.q_weight [0119/0485] saving model.layers.4.self_attn.o_proj.q_scale [0120/0485] saving model.layers.5.input_layernorm.weight [0121/0485] saving model.layers.5.mlp.down_proj.q_weight [0122/0485] saving model.layers.5.mlp.down_proj.q_scale [0123/0485] saving model.layers.5.mlp.gate_up_proj.q_weight [0124/0485] saving model.layers.5.mlp.gate_up_proj.q_scale [0125/0485] saving model.layers.5.post_attention_layernorm.weight [0126/0485] saving model.layers.5.self_attn.qkv_proj.q_weight [0127/0485] saving model.layers.5.self_attn.qkv_proj.q_scale [0128/0485] saving model.layers.5.self_attn.o_proj.q_weight [0129/0485] saving model.layers.5.self_attn.o_proj.q_scale [0130/0485] saving model.layers.6.mlp.gate_up_proj.q_weight [0131/0485] saving model.layers.6.mlp.gate_up_proj.q_scale [0132/0485] saving model.layers.6.self_attn.qkv_proj.q_weight [0133/0485] saving model.layers.6.self_attn.qkv_proj.q_scale [0134/0485] saving model.layers.6.self_attn.o_proj.q_weight [0135/0485] saving model.layers.6.self_attn.o_proj.q_scale [0136/0485] saving model.layers.10.input_layernorm.weight [0137/0485] saving model.layers.10.mlp.down_proj.q_weight [0138/0485] saving model.layers.10.mlp.down_proj.q_scale [0139/0485] saving model.layers.10.mlp.gate_up_proj.q_weight [0140/0485] saving model.layers.10.mlp.gate_up_proj.q_scale [0141/0485] saving model.layers.10.post_attention_layernorm.weight [0142/0485] saving model.layers.10.self_attn.qkv_proj.q_weight [0143/0485] saving model.layers.10.self_attn.qkv_proj.q_scale [0144/0485] saving model.layers.10.self_attn.o_proj.q_weight [0145/0485] saving model.layers.10.self_attn.o_proj.q_scale [0146/0485] saving model.layers.11.input_layernorm.weight [0147/0485] saving model.layers.11.mlp.down_proj.q_weight [0148/0485] saving model.layers.11.mlp.down_proj.q_scale [0149/0485] saving model.layers.11.mlp.gate_up_proj.q_weight [0150/0485] saving model.layers.11.mlp.gate_up_proj.q_scale [0151/0485] saving model.layers.11.post_attention_layernorm.weight [0152/0485] saving model.layers.11.self_attn.qkv_proj.q_weight [0153/0485] saving model.layers.11.self_attn.qkv_proj.q_scale [0154/0485] saving model.layers.11.self_attn.o_proj.q_weight [0155/0485] saving model.layers.11.self_attn.o_proj.q_scale [0156/0485] saving model.layers.12.input_layernorm.weight [0157/0485] saving model.layers.12.mlp.down_proj.q_weight [0158/0485] saving model.layers.12.mlp.down_proj.q_scale [0159/0485] saving model.layers.12.mlp.gate_up_proj.q_weight [0160/0485] saving model.layers.12.mlp.gate_up_proj.q_scale [0161/0485] saving model.layers.12.post_attention_layernorm.weight [0162/0485] saving model.layers.12.self_attn.qkv_proj.q_weight [0163/0485] saving model.layers.12.self_attn.qkv_proj.q_scale [0164/0485] saving model.layers.12.self_attn.o_proj.q_weight [0165/0485] saving model.layers.12.self_attn.o_proj.q_scale [0166/0485] saving model.layers.13.mlp.gate_up_proj.q_weight [0167/0485] saving model.layers.13.mlp.gate_up_proj.q_scale [0168/0485] saving model.layers.13.self_attn.qkv_proj.q_weight [0169/0485] saving model.layers.13.self_attn.qkv_proj.q_scale [0170/0485] saving model.layers.13.self_attn.o_proj.q_weight [0171/0485] saving model.layers.13.self_attn.o_proj.q_scale [0172/0485] saving model.layers.6.input_layernorm.weight [0173/0485] saving model.layers.6.mlp.down_proj.q_weight [0174/0485] saving model.layers.6.mlp.down_proj.q_scale [0175/0485] saving model.layers.6.post_attention_layernorm.weight [0176/0485] saving model.layers.7.input_layernorm.weight [0177/0485] saving model.layers.7.mlp.down_proj.q_weight [0178/0485] saving model.layers.7.mlp.down_proj.q_scale [0179/0485] saving model.layers.7.mlp.gate_up_proj.q_weight [0180/0485] saving model.layers.7.mlp.gate_up_proj.q_scale [0181/0485] saving model.layers.7.post_attention_layernorm.weight [0182/0485] saving model.layers.7.self_attn.qkv_proj.q_weight [0183/0485] saving model.layers.7.self_attn.qkv_proj.q_scale [0184/0485] saving model.layers.7.self_attn.o_proj.q_weight [0185/0485] saving model.layers.7.self_attn.o_proj.q_scale [0186/0485] saving model.layers.8.input_layernorm.weight [0187/0485] saving model.layers.8.mlp.down_proj.q_weight [0188/0485] saving model.layers.8.mlp.down_proj.q_scale [0189/0485] saving model.layers.8.mlp.gate_up_proj.q_weight [0190/0485] saving model.layers.8.mlp.gate_up_proj.q_scale [0191/0485] saving model.layers.8.post_attention_layernorm.weight [0192/0485] saving model.layers.8.self_attn.qkv_proj.q_weight [0193/0485] saving model.layers.8.self_attn.qkv_proj.q_scale [0194/0485] saving model.layers.8.self_attn.o_proj.q_weight [0195/0485] saving model.layers.8.self_attn.o_proj.q_scale [0196/0485] saving model.layers.9.input_layernorm.weight [0197/0485] saving model.layers.9.mlp.down_proj.q_weight [0198/0485] saving model.layers.9.mlp.down_proj.q_scale [0199/0485] saving model.layers.9.mlp.gate_up_proj.q_weight [0200/0485] saving model.layers.9.mlp.gate_up_proj.q_scale [0201/0485] saving model.layers.9.post_attention_layernorm.weight [0202/0485] saving model.layers.9.self_attn.qkv_proj.q_weight [0203/0485] saving model.layers.9.self_attn.qkv_proj.q_scale [0204/0485] saving model.layers.9.self_attn.o_proj.q_weight [0205/0485] saving model.layers.9.self_attn.o_proj.q_scale [0206/0485] saving model.layers.13.input_layernorm.weight [0207/0485] saving model.layers.13.mlp.down_proj.q_weight [0208/0485] saving model.layers.13.mlp.down_proj.q_scale [0209/0485] saving model.layers.13.post_attention_layernorm.weight [0210/0485] saving model.layers.14.input_layernorm.weight [0211/0485] saving model.layers.14.mlp.down_proj.q_weight [0212/0485] saving model.layers.14.mlp.down_proj.q_scale [0213/0485] saving model.layers.14.mlp.gate_up_proj.q_weight [0214/0485] saving model.layers.14.mlp.gate_up_proj.q_scale [0215/0485] saving model.layers.14.post_attention_layernorm.weight [0216/0485] saving model.layers.14.self_attn.qkv_proj.q_weight [0217/0485] saving model.layers.14.self_attn.qkv_proj.q_scale [0218/0485] saving model.layers.14.self_attn.o_proj.q_weight [0219/0485] saving model.layers.14.self_attn.o_proj.q_scale [0220/0485] saving model.layers.15.input_layernorm.weight [0221/0485] saving model.layers.15.mlp.down_proj.q_weight [0222/0485] saving model.layers.15.mlp.down_proj.q_scale [0223/0485] saving model.layers.15.mlp.gate_up_proj.q_weight [0224/0485] saving model.layers.15.mlp.gate_up_proj.q_scale [0225/0485] saving model.layers.15.post_attention_layernorm.weight [0226/0485] saving model.layers.15.self_attn.qkv_proj.q_weight [0227/0485] saving model.layers.15.self_attn.qkv_proj.q_scale [0228/0485] saving model.layers.15.self_attn.o_proj.q_weight [0229/0485] saving model.layers.15.self_attn.o_proj.q_scale [0230/0485] saving model.layers.16.input_layernorm.weight [0231/0485] saving model.layers.16.mlp.down_proj.q_weight [0232/0485] saving model.layers.16.mlp.down_proj.q_scale [0233/0485] saving model.layers.16.mlp.gate_up_proj.q_weight [0234/0485] saving model.layers.16.mlp.gate_up_proj.q_scale [0235/0485] saving model.layers.16.post_attention_layernorm.weight [0236/0485] saving model.layers.16.self_attn.qkv_proj.q_weight [0237/0485] saving model.layers.16.self_attn.qkv_proj.q_scale [0238/0485] saving model.layers.16.self_attn.o_proj.q_weight [0239/0485] saving model.layers.16.self_attn.o_proj.q_scale [0240/0485] saving model.layers.17.input_layernorm.weight [0241/0485] saving model.layers.17.mlp.down_proj.q_weight [0242/0485] saving model.layers.17.mlp.down_proj.q_scale [0243/0485] saving model.layers.17.mlp.gate_up_proj.q_weight [0244/0485] saving model.layers.17.mlp.gate_up_proj.q_scale [0245/0485] saving model.layers.17.post_attention_layernorm.weight [0246/0485] saving model.layers.17.self_attn.qkv_proj.q_weight [0247/0485] saving model.layers.17.self_attn.qkv_proj.q_scale [0248/0485] saving model.layers.17.self_attn.o_proj.q_weight [0249/0485] saving model.layers.17.self_attn.o_proj.q_scale [0250/0485] saving model.layers.18.input_layernorm.weight [0251/0485] saving model.layers.18.mlp.down_proj.q_weight [0252/0485] saving model.layers.18.mlp.down_proj.q_scale [0253/0485] saving model.layers.18.mlp.gate_up_proj.q_weight [0254/0485] saving model.layers.18.mlp.gate_up_proj.q_scale [0255/0485] saving model.layers.18.post_attention_layernorm.weight [0256/0485] saving model.layers.18.self_attn.qkv_proj.q_weight [0257/0485] saving model.layers.18.self_attn.qkv_proj.q_scale [0258/0485] saving model.layers.18.self_attn.o_proj.q_weight [0259/0485] saving model.layers.18.self_attn.o_proj.q_scale [0260/0485] saving model.layers.19.input_layernorm.weight [0261/0485] saving model.layers.19.mlp.down_proj.q_weight [0262/0485] saving model.layers.19.mlp.down_proj.q_scale [0263/0485] saving model.layers.19.mlp.gate_up_proj.q_weight [0264/0485] saving model.layers.19.mlp.gate_up_proj.q_scale [0265/0485] saving model.layers.19.post_attention_layernorm.weight [0266/0485] saving model.layers.19.self_attn.qkv_proj.q_weight [0267/0485] saving model.layers.19.self_attn.qkv_proj.q_scale [0268/0485] saving model.layers.19.self_attn.o_proj.q_weight [0269/0485] saving model.layers.19.self_attn.o_proj.q_scale [0270/0485] saving model.layers.20.mlp.gate_up_proj.q_weight [0271/0485] saving model.layers.20.mlp.gate_up_proj.q_scale [0272/0485] saving model.layers.20.self_attn.qkv_proj.q_weight [0273/0485] saving model.layers.20.self_attn.qkv_proj.q_scale [0274/0485] saving model.layers.20.self_attn.o_proj.q_weight [0275/0485] saving model.layers.20.self_attn.o_proj.q_scale [0276/0485] saving model.layers.20.input_layernorm.weight [0277/0485] saving model.layers.20.mlp.down_proj.q_weight [0278/0485] saving model.layers.20.mlp.down_proj.q_scale [0279/0485] saving model.layers.20.post_attention_layernorm.weight [0280/0485] saving model.layers.21.input_layernorm.weight [0281/0485] saving model.layers.21.mlp.down_proj.q_weight [0282/0485] saving model.layers.21.mlp.down_proj.q_scale [0283/0485] saving model.layers.21.mlp.gate_up_proj.q_weight [0284/0485] saving model.layers.21.mlp.gate_up_proj.q_scale [0285/0485] saving model.layers.21.post_attention_layernorm.weight [0286/0485] saving model.layers.21.self_attn.qkv_proj.q_weight [0287/0485] saving model.layers.21.self_attn.qkv_proj.q_scale [0288/0485] saving model.layers.21.self_attn.o_proj.q_weight [0289/0485] saving model.layers.21.self_attn.o_proj.q_scale [0290/0485] saving model.layers.22.input_layernorm.weight [0291/0485] saving model.layers.22.mlp.down_proj.q_weight [0292/0485] saving model.layers.22.mlp.down_proj.q_scale [0293/0485] saving model.layers.22.mlp.gate_up_proj.q_weight [0294/0485] saving model.layers.22.mlp.gate_up_proj.q_scale [0295/0485] saving model.layers.22.post_attention_layernorm.weight [0296/0485] saving model.layers.22.self_attn.qkv_proj.q_weight [0297/0485] saving model.layers.22.self_attn.qkv_proj.q_scale [0298/0485] saving model.layers.22.self_attn.o_proj.q_weight [0299/0485] saving model.layers.22.self_attn.o_proj.q_scale [0300/0485] saving model.layers.23.input_layernorm.weight [0301/0485] saving model.layers.23.mlp.down_proj.q_weight [0302/0485] saving model.layers.23.mlp.down_proj.q_scale [0303/0485] saving model.layers.23.mlp.gate_up_proj.q_weight [0304/0485] saving model.layers.23.mlp.gate_up_proj.q_scale [0305/0485] saving model.layers.23.post_attention_layernorm.weight [0306/0485] saving model.layers.23.self_attn.qkv_proj.q_weight [0307/0485] saving model.layers.23.self_attn.qkv_proj.q_scale [0308/0485] saving model.layers.23.self_attn.o_proj.q_weight [0309/0485] saving model.layers.23.self_attn.o_proj.q_scale [0310/0485] saving model.layers.24.input_layernorm.weight [0311/0485] saving model.layers.24.mlp.down_proj.q_weight [0312/0485] saving model.layers.24.mlp.down_proj.q_scale [0313/0485] saving model.layers.24.mlp.gate_up_proj.q_weight [0314/0485] saving model.layers.24.mlp.gate_up_proj.q_scale [0315/0485] saving model.layers.24.post_attention_layernorm.weight [0316/0485] saving model.layers.24.self_attn.qkv_proj.q_weight [0317/0485] saving model.layers.24.self_attn.qkv_proj.q_scale [0318/0485] saving model.layers.24.self_attn.o_proj.q_weight [0319/0485] saving model.layers.24.self_attn.o_proj.q_scale [0320/0485] saving model.layers.25.input_layernorm.weight [0321/0485] saving model.layers.25.mlp.down_proj.q_weight [0322/0485] saving model.layers.25.mlp.down_proj.q_scale [0323/0485] saving model.layers.25.mlp.gate_up_proj.q_weight [0324/0485] saving model.layers.25.mlp.gate_up_proj.q_scale [0325/0485] saving model.layers.25.post_attention_layernorm.weight [0326/0485] saving model.layers.25.self_attn.qkv_proj.q_weight [0327/0485] saving model.layers.25.self_attn.qkv_proj.q_scale [0328/0485] saving model.layers.25.self_attn.o_proj.q_weight [0329/0485] saving model.layers.25.self_attn.o_proj.q_scale [0330/0485] saving model.layers.26.input_layernorm.weight [0331/0485] saving model.layers.26.mlp.down_proj.q_weight [0332/0485] saving model.layers.26.mlp.down_proj.q_scale [0333/0485] saving model.layers.26.mlp.gate_up_proj.q_weight [0334/0485] saving model.layers.26.mlp.gate_up_proj.q_scale [0335/0485] saving model.layers.26.post_attention_layernorm.weight [0336/0485] saving model.layers.26.self_attn.qkv_proj.q_weight [0337/0485] saving model.layers.26.self_attn.qkv_proj.q_scale [0338/0485] saving model.layers.26.self_attn.o_proj.q_weight [0339/0485] saving model.layers.26.self_attn.o_proj.q_scale [0340/0485] saving model.layers.27.mlp.gate_up_proj.q_weight [0341/0485] saving model.layers.27.mlp.gate_up_proj.q_scale [0342/0485] saving model.layers.27.self_attn.qkv_proj.q_weight [0343/0485] saving model.layers.27.self_attn.qkv_proj.q_scale [0344/0485] saving model.layers.27.self_attn.o_proj.q_weight [0345/0485] saving model.layers.27.self_attn.o_proj.q_scale [0346/0485] saving model.layers.27.input_layernorm.weight [0347/0485] saving model.layers.27.mlp.down_proj.q_weight [0348/0485] saving model.layers.27.mlp.down_proj.q_scale [0349/0485] saving model.layers.27.post_attention_layernorm.weight [0350/0485] saving model.layers.28.input_layernorm.weight [0351/0485] saving model.layers.28.mlp.down_proj.q_weight [0352/0485] saving model.layers.28.mlp.down_proj.q_scale [0353/0485] saving model.layers.28.mlp.gate_up_proj.q_weight [0354/0485] saving model.layers.28.mlp.gate_up_proj.q_scale [0355/0485] saving model.layers.28.post_attention_layernorm.weight [0356/0485] saving model.layers.28.self_attn.qkv_proj.q_weight [0357/0485] saving model.layers.28.self_attn.qkv_proj.q_scale [0358/0485] saving model.layers.28.self_attn.o_proj.q_weight [0359/0485] saving model.layers.28.self_attn.o_proj.q_scale [0360/0485] saving model.layers.29.input_layernorm.weight [0361/0485] saving model.layers.29.mlp.down_proj.q_weight [0362/0485] saving model.layers.29.mlp.down_proj.q_scale [0363/0485] saving model.layers.29.mlp.gate_up_proj.q_weight [0364/0485] saving model.layers.29.mlp.gate_up_proj.q_scale [0365/0485] saving model.layers.29.post_attention_layernorm.weight [0366/0485] saving model.layers.29.self_attn.qkv_proj.q_weight [0367/0485] saving model.layers.29.self_attn.qkv_proj.q_scale [0368/0485] saving model.layers.29.self_attn.o_proj.q_weight [0369/0485] saving model.layers.29.self_attn.o_proj.q_scale [0370/0485] saving model.layers.30.input_layernorm.weight [0371/0485] saving model.layers.30.mlp.down_proj.q_weight [0372/0485] saving model.layers.30.mlp.down_proj.q_scale [0373/0485] saving model.layers.30.mlp.gate_up_proj.q_weight [0374/0485] saving model.layers.30.mlp.gate_up_proj.q_scale [0375/0485] saving model.layers.30.post_attention_layernorm.weight [0376/0485] saving model.layers.30.self_attn.qkv_proj.q_weight [0377/0485] saving model.layers.30.self_attn.qkv_proj.q_scale [0378/0485] saving model.layers.30.self_attn.o_proj.q_weight [0379/0485] saving model.layers.30.self_attn.o_proj.q_scale [0380/0485] saving model.layers.31.input_layernorm.weight [0381/0485] saving model.layers.31.mlp.down_proj.q_weight [0382/0485] saving model.layers.31.mlp.down_proj.q_scale [0383/0485] saving model.layers.31.mlp.gate_up_proj.q_weight [0384/0485] saving model.layers.31.mlp.gate_up_proj.q_scale [0385/0485] saving model.layers.31.post_attention_layernorm.weight [0386/0485] saving model.layers.31.self_attn.qkv_proj.q_weight [0387/0485] saving model.layers.31.self_attn.qkv_proj.q_scale [0388/0485] saving model.layers.31.self_attn.o_proj.q_weight [0389/0485] saving model.layers.31.self_attn.o_proj.q_scale [0390/0485] saving model.layers.32.input_layernorm.weight [0391/0485] saving model.layers.32.mlp.down_proj.q_weight [0392/0485] saving model.layers.32.mlp.down_proj.q_scale [0393/0485] saving model.layers.32.mlp.gate_up_proj.q_weight [0394/0485] saving model.layers.32.mlp.gate_up_proj.q_scale [0395/0485] saving model.layers.32.post_attention_layernorm.weight [0396/0485] saving model.layers.32.self_attn.qkv_proj.q_weight [0397/0485] saving model.layers.32.self_attn.qkv_proj.q_scale [0398/0485] saving model.layers.32.self_attn.o_proj.q_weight [0399/0485] saving model.layers.32.self_attn.o_proj.q_scale [0400/0485] saving model.layers.33.input_layernorm.weight [0401/0485] saving model.layers.33.mlp.down_proj.q_weight [0402/0485] saving model.layers.33.mlp.down_proj.q_scale [0403/0485] saving model.layers.33.mlp.gate_up_proj.q_weight [0404/0485] saving model.layers.33.mlp.gate_up_proj.q_scale [0405/0485] saving model.layers.33.post_attention_layernorm.weight [0406/0485] saving model.layers.33.self_attn.qkv_proj.q_weight [0407/0485] saving model.layers.33.self_attn.qkv_proj.q_scale [0408/0485] saving model.layers.33.self_attn.o_proj.q_weight [0409/0485] saving model.layers.33.self_attn.o_proj.q_scale [0410/0485] saving model.layers.34.mlp.gate_up_proj.q_weight [0411/0485] saving model.layers.34.mlp.gate_up_proj.q_scale [0412/0485] saving model.layers.34.self_attn.qkv_proj.q_weight [0413/0485] saving model.layers.34.self_attn.qkv_proj.q_scale [0414/0485] saving model.layers.34.self_attn.o_proj.q_weight [0415/0485] saving model.layers.34.self_attn.o_proj.q_scale [0416/0485] saving model.layers.34.input_layernorm.weight [0417/0485] saving model.layers.34.mlp.down_proj.q_weight [0418/0485] saving model.layers.34.mlp.down_proj.q_scale [0419/0485] saving model.layers.34.post_attention_layernorm.weight [0420/0485] saving model.layers.35.input_layernorm.weight [0421/0485] saving model.layers.35.mlp.down_proj.q_weight [0422/0485] saving model.layers.35.mlp.down_proj.q_scale [0423/0485] saving model.layers.35.mlp.gate_up_proj.q_weight [0424/0485] saving model.layers.35.mlp.gate_up_proj.q_scale [0425/0485] saving model.layers.35.post_attention_layernorm.weight [0426/0485] saving model.layers.35.self_attn.qkv_proj.q_weight [0427/0485] saving model.layers.35.self_attn.qkv_proj.q_scale [0428/0485] saving model.layers.35.self_attn.o_proj.q_weight [0429/0485] saving model.layers.35.self_attn.o_proj.q_scale [0430/0485] saving model.layers.36.input_layernorm.weight [0431/0485] saving model.layers.36.mlp.down_proj.q_weight [0432/0485] saving model.layers.36.mlp.down_proj.q_scale [0433/0485] saving model.layers.36.mlp.gate_up_proj.q_weight [0434/0485] saving model.layers.36.mlp.gate_up_proj.q_scale [0435/0485] saving model.layers.36.post_attention_layernorm.weight [0436/0485] saving model.layers.36.self_attn.qkv_proj.q_weight [0437/0485] saving model.layers.36.self_attn.qkv_proj.q_scale [0438/0485] saving model.layers.36.self_attn.o_proj.q_weight [0439/0485] saving model.layers.36.self_attn.o_proj.q_scale [0440/0485] saving model.layers.37.input_layernorm.weight [0441/0485] saving model.layers.37.mlp.down_proj.q_weight [0442/0485] saving model.layers.37.mlp.down_proj.q_scale [0443/0485] saving model.layers.37.mlp.gate_up_proj.q_weight [0444/0485] saving model.layers.37.mlp.gate_up_proj.q_scale [0445/0485] saving model.layers.37.post_attention_layernorm.weight [0446/0485] saving model.layers.37.self_attn.qkv_proj.q_weight [0447/0485] saving model.layers.37.self_attn.qkv_proj.q_scale [0448/0485] saving model.layers.37.self_attn.o_proj.q_weight [0449/0485] saving model.layers.37.self_attn.o_proj.q_scale [0450/0485] saving model.layers.38.input_layernorm.weight[2024-01-29 22:02:25] INFO convert_weight.py:143: Saved to directory: [1m/var/folders/50/mzqbqxqj5fddcby2mg3h334c0000gp/T/tmpe1uh65a9[0m [0451/0485] saving model.layers.38.mlp.down_proj.q_weight [0452/0485] saving model.layers.38.mlp.down_proj.q_scale [0453/0485] saving model.layers.38.mlp.gate_up_proj.q_weight [0454/0485] saving model.layers.38.mlp.gate_up_proj.q_scale [0455/0485] saving model.layers.38.post_attention_layernorm.weight [0456/0485] saving model.layers.38.self_attn.qkv_proj.q_weight [0457/0485] saving model.layers.38.self_attn.qkv_proj.q_scale [0458/0485] saving model.layers.38.self_attn.o_proj.q_weight [0459/0485] saving model.layers.38.self_attn.o_proj.q_scale [0460/0485] saving model.layers.39.input_layernorm.weight [0461/0485] saving model.layers.39.mlp.down_proj.q_weight [0462/0485] saving model.layers.39.mlp.down_proj.q_scale [0463/0485] saving model.layers.39.mlp.gate_up_proj.q_weight [0464/0485] saving model.layers.39.mlp.gate_up_proj.q_scale [0465/0485] saving model.layers.39.post_attention_layernorm.weight [0466/0485] saving model.layers.39.self_attn.qkv_proj.q_weight [0467/0485] saving model.layers.39.self_attn.qkv_proj.q_scale [0468/0485] saving model.layers.39.self_attn.o_proj.q_weight [0469/0485] saving model.layers.39.self_attn.o_proj.q_scale [0470/0485] saving model.layers.40.input_layernorm.weight [0471/0485] saving model.layers.40.mlp.down_proj.q_weight [0472/0485] saving model.layers.40.mlp.down_proj.q_scale [0473/0485] saving model.layers.40.mlp.gate_up_proj.q_weight [0474/0485] saving model.layers.40.mlp.gate_up_proj.q_scale [0475/0485] saving model.layers.40.post_attention_layernorm.weight [0476/0485] saving model.layers.40.self_attn.qkv_proj.q_weight [0477/0485] saving model.layers.40.self_attn.qkv_proj.q_scale [0478/0485] saving model.layers.40.self_attn.o_proj.q_weight [0479/0485] saving model.layers.40.self_attn.o_proj.q_scale [0480/0485] saving model.layers.41.mlp.gate_up_proj.q_weight [0481/0485] saving model.layers.41.mlp.gate_up_proj.q_scale [0482/0485] saving model.layers.41.self_attn.qkv_proj.q_weight [0483/0485] saving model.layers.41.self_attn.qkv_proj.q_scale [0484/0485] saving model.layers.41.self_attn.o_proj.q_weight [0485/0485] saving model.layers.41.self_attn.o_proj.q_scale All finished, 275 total shards committed, record saved to /var/folders/50/mzqbqxqj5fddcby2mg3h334c0000gp/T/tmpe1uh65a9/ndarray-cache.json