riczhou's picture
Initial commit
1603105 verified
/opt/conda/envs/py310/bin/python -m mlc_llm gen_config /models/Qwen2-7B-Instruct --quantization q0f16 --conv-template chatml --output /models/mlc-delivery/hf/mlc-ai/Qwen2-7B-Instruct-q0f16-MLC
[2024-06-06 23:44:36] INFO auto_config.py:116: Found model configuration: /models/Qwen2-7B-Instruct/config.json
[2024-06-06 23:44:36] INFO auto_config.py:154: Found model type: qwen2. Use `--model-type` to override.
[2024-06-06 23:44:36] INFO qwen2_model.py:49: context_window_size not found in config.json. Falling back to max_position_embeddings (32768)
[2024-06-06 23:44:36] INFO qwen2_model.py:66: prefill_chunk_size defaults to 2048
[2024-06-06 23:44:36] INFO config.py:107: Overriding max_batch_size from 1 to 80
[2024-06-06 23:44:36] INFO gen_config.py:143: [generation_config.json] Setting bos_token_id: 151643
[2024-06-06 23:44:36] INFO gen_config.py:143: [generation_config.json] Setting pad_token_id: 151643
[2024-06-06 23:44:36] INFO gen_config.py:143: [generation_config.json] Setting eos_token_id: [151645, 151643]
[2024-06-06 23:44:36] INFO gen_config.py:143: [generation_config.json] Setting repetition_penalty: 1.05
[2024-06-06 23:44:36] INFO gen_config.py:143: [generation_config.json] Setting temperature: 0.7
[2024-06-06 23:44:36] INFO gen_config.py:143: [generation_config.json] Setting top_p: 0.8
[2024-06-06 23:44:36] INFO gen_config.py:157: Not found tokenizer config: /models/Qwen2-7B-Instruct/tokenizer.model
[2024-06-06 23:44:36] INFO gen_config.py:155: Found tokenizer config: /models/Qwen2-7B-Instruct/tokenizer.json. Copying to /models/mlc-delivery/hf/mlc-ai/Qwen2-7B-Instruct-q0f16-MLC/tokenizer.json
[2024-06-06 23:44:36] INFO gen_config.py:155: Found tokenizer config: /models/Qwen2-7B-Instruct/vocab.json. Copying to /models/mlc-delivery/hf/mlc-ai/Qwen2-7B-Instruct-q0f16-MLC/vocab.json
[2024-06-06 23:44:36] INFO gen_config.py:155: Found tokenizer config: /models/Qwen2-7B-Instruct/merges.txt. Copying to /models/mlc-delivery/hf/mlc-ai/Qwen2-7B-Instruct-q0f16-MLC/merges.txt
[2024-06-06 23:44:36] INFO gen_config.py:157: Not found tokenizer config: /models/Qwen2-7B-Instruct/added_tokens.json
[2024-06-06 23:44:36] INFO gen_config.py:155: Found tokenizer config: /models/Qwen2-7B-Instruct/tokenizer_config.json. Copying to /models/mlc-delivery/hf/mlc-ai/Qwen2-7B-Instruct-q0f16-MLC/tokenizer_config.json
[2024-06-06 23:44:36] INFO gen_config.py:216: Detected tokenizer info: {'token_postproc_method': 'byte_level', 'prepend_space_in_encode': False, 'strip_space_in_decode': False}
[2024-06-06 23:44:36] INFO gen_config.py:32: [System default] Setting presence_penalty: 0.0
[2024-06-06 23:44:36] INFO gen_config.py:32: [System default] Setting frequency_penalty: 0.0
[2024-06-06 23:44:36] INFO gen_config.py:223: Dumping configuration file to: /models/mlc-delivery/hf/mlc-ai/Qwen2-7B-Instruct-q0f16-MLC/mlc-chat-config.json
/opt/conda/envs/py310/bin/python -m mlc_llm convert_weight /models/Qwen2-7B-Instruct --quantization q0f16 --output /models/mlc-delivery/hf/mlc-ai/Qwen2-7B-Instruct-q0f16-MLC
[2024-06-06 23:44:38] INFO auto_config.py:116: Found model configuration: /models/Qwen2-7B-Instruct/config.json
[2024-06-06 23:44:39] INFO auto_device.py:79: Found device: cuda:0
[2024-06-06 23:44:41] INFO auto_device.py:88: Not found device: rocm:0
[2024-06-06 23:44:42] INFO auto_device.py:88: Not found device: metal:0
[2024-06-06 23:44:44] INFO auto_device.py:79: Found device: vulkan:0
[2024-06-06 23:44:44] INFO auto_device.py:79: Found device: vulkan:1
[2024-06-06 23:44:44] INFO auto_device.py:79: Found device: vulkan:2
[2024-06-06 23:44:44] INFO auto_device.py:79: Found device: vulkan:3
[2024-06-06 23:44:45] INFO auto_device.py:88: Not found device: opencl:0
[2024-06-06 23:44:45] INFO auto_device.py:35: Using device: cuda:0
[2024-06-06 23:44:45] INFO auto_weight.py:71: Finding weights in: /models/Qwen2-7B-Instruct
[2024-06-06 23:44:45] INFO auto_weight.py:137: Not found Huggingface PyTorch
[2024-06-06 23:44:45] INFO auto_weight.py:144: Found source weight format: huggingface-safetensor. Source configuration: /models/Qwen2-7B-Instruct/model.safetensors.index.json
[2024-06-06 23:44:45] INFO auto_weight.py:107: Using source weight configuration: /models/Qwen2-7B-Instruct/model.safetensors.index.json. Use `--source` to override.
[2024-06-06 23:44:45] INFO auto_weight.py:111: Using source weight format: huggingface-safetensor. Use `--source-format` to override.
[2024-06-06 23:44:45] INFO auto_config.py:154: Found model type: qwen2. Use `--model-type` to override.
[2024-06-06 23:44:45] INFO qwen2_model.py:49: context_window_size not found in config.json. Falling back to max_position_embeddings (32768)
[2024-06-06 23:44:45] INFO qwen2_model.py:66: prefill_chunk_size defaults to 2048
Weight conversion with arguments:
--config /models/Qwen2-7B-Instruct/config.json
--quantization NoQuantize(name='q0f16', kind='no-quant', model_dtype='float16')
--model-type qwen2
--device cuda:0
--source /models/Qwen2-7B-Instruct/model.safetensors.index.json
--source-format huggingface-safetensor
--output /models/mlc-delivery/hf/mlc-ai/Qwen2-7B-Instruct-q0f16-MLC
Start storing to cache /models/mlc-delivery/hf/mlc-ai/Qwen2-7B-Instruct-q0f16-MLC
0%| | 0/199 [00:00<?, ?it/s] [2024-06-06 23:44:46] INFO huggingface_loader.py:185: Loading HF parameters from: /models/Qwen2-7B-Instruct/model-00004-of-00004.safetensors
0%| | 0/199 [00:00<?, ?it/s] [2024-06-06 23:44:53] INFO huggingface_loader.py:175: [Not quantized] Parameter: "lm_head.weight", shape: (152064, 3584), dtype: float16
0%| | 0/199 [00:06<?, ?it/s] 1%| | 1/199 [00:09<30:33, 9.26s/it] [2024-06-06 23:44:56] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.22.input_layernorm.weight", shape: (3584,), dtype: float16
1%| | 1/199 [00:09<30:33, 9.26s/it] 1%| | 2/199 [00:09<12:43, 3.87s/it] [2024-06-06 23:44:56] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.22.mlp.down_proj.weight", shape: (3584, 18944), dtype: float16
1%| | 2/199 [00:09<12:43, 3.87s/it] 2%|▏ | 3/199 [00:09<07:48, 2.39s/it] [2024-06-06 23:44:56] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.22.post_attention_layernorm.weight", shape: (3584,), dtype: float16
2%|▏ | 3/199 [00:09<07:48, 2.39s/it] [2024-06-06 23:44:56] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.23.input_layernorm.weight", shape: (3584,), dtype: float16
2%|▏ | 3/199 [00:10<07:48, 2.39s/it] [2024-06-06 23:44:57] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.23.mlp.down_proj.weight", shape: (3584, 18944), dtype: float16
2%|▏ | 3/199 [00:10<07:48, 2.39s/it] 3%|β–Ž | 6/199 [00:10<03:03, 1.05it/s] [2024-06-06 23:44:58] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.23.mlp.gate_up_proj.weight", shape: (37888, 3584), dtype: float16
3%|β–Ž | 6/199 [00:11<03:03, 1.05it/s] 4%|β–Ž | 7/199 [00:11<03:20, 1.04s/it] [2024-06-06 23:44:58] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.23.post_attention_layernorm.weight", shape: (3584,), dtype: float16
4%|β–Ž | 7/199 [00:11<03:20, 1.04s/it] [2024-06-06 23:44:58] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.23.self_attn.c_attn.bias", shape: (4608,), dtype: float16
4%|β–Ž | 7/199 [00:12<03:20, 1.04s/it] [2024-06-06 23:44:58] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.23.self_attn.c_attn.weight", shape: (4608, 3584), dtype: float16
4%|β–Ž | 7/199 [00:12<03:20, 1.04s/it] 5%|β–Œ | 10/199 [00:12<01:41, 1.86it/s] [2024-06-06 23:44:59] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.23.self_attn.o_proj.weight", shape: (3584, 3584), dtype: float16
5%|β–Œ | 10/199 [00:12<01:41, 1.86it/s] 6%|β–Œ | 11/199 [00:12<01:25, 2.20it/s] [2024-06-06 23:44:59] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.24.input_layernorm.weight", shape: (3584,), dtype: float16
6%|β–Œ | 11/199 [00:12<01:25, 2.20it/s] [2024-06-06 23:44:59] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.24.mlp.down_proj.weight", shape: (3584, 18944), dtype: float16
6%|β–Œ | 11/199 [00:12<01:25, 2.20it/s] 7%|β–‹ | 13/199 [00:12<01:15, 2.45it/s] [2024-06-06 23:45:00] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.24.mlp.gate_up_proj.weight", shape: (37888, 3584), dtype: float16
7%|β–‹ | 13/199 [00:13<01:15, 2.45it/s] 7%|β–‹ | 14/199 [00:14<01:50, 1.68it/s] [2024-06-06 23:45:01] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.24.post_attention_layernorm.weight", shape: (3584,), dtype: float16
7%|β–‹ | 14/199 [00:14<01:50, 1.68it/s] [2024-06-06 23:45:01] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.24.self_attn.c_attn.bias", shape: (4608,), dtype: float16
7%|β–‹ | 14/199 [00:14<01:50, 1.68it/s] [2024-06-06 23:45:01] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.24.self_attn.c_attn.weight", shape: (4608, 3584), dtype: float16
7%|β–‹ | 14/199 [00:14<01:50, 1.68it/s] 9%|β–Š | 17/199 [00:14<01:02, 2.90it/s] [2024-06-06 23:45:01] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.24.self_attn.o_proj.weight", shape: (3584, 3584), dtype: float16
9%|β–Š | 17/199 [00:14<01:02, 2.90it/s] 9%|β–‰ | 18/199 [00:14<00:54, 3.31it/s] [2024-06-06 23:45:01] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.25.input_layernorm.weight", shape: (3584,), dtype: float16
9%|β–‰ | 18/199 [00:14<00:54, 3.31it/s] [2024-06-06 23:45:01] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.25.mlp.down_proj.weight", shape: (3584, 18944), dtype: float16
9%|β–‰ | 18/199 [00:14<00:54, 3.31it/s] 10%|β–ˆ | 20/199 [00:15<00:54, 3.29it/s] [2024-06-06 23:45:02] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.25.mlp.gate_up_proj.weight", shape: (37888, 3584), dtype: float16
10%|β–ˆ | 20/199 [00:15<00:54, 3.29it/s] 11%|β–ˆ | 21/199 [00:16<01:28, 2.01it/s] [2024-06-06 23:45:03] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.25.post_attention_layernorm.weight", shape: (3584,), dtype: float16
11%|β–ˆ | 21/199 [00:16<01:28, 2.01it/s] [2024-06-06 23:45:03] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.25.self_attn.c_attn.bias", shape: (4608,), dtype: float16
11%|β–ˆ | 21/199 [00:16<01:28, 2.01it/s] [2024-06-06 23:45:03] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.25.self_attn.c_attn.weight", shape: (4608, 3584), dtype: float16
11%|β–ˆ | 21/199 [00:16<01:28, 2.01it/s] 12%|β–ˆβ– | 24/199 [00:16<00:51, 3.38it/s] [2024-06-06 23:45:03] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.25.self_attn.o_proj.weight", shape: (3584, 3584), dtype: float16
12%|β–ˆβ– | 24/199 [00:16<00:51, 3.38it/s] 13%|β–ˆβ–Ž | 25/199 [00:16<00:45, 3.83it/s] [2024-06-06 23:45:03] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.26.input_layernorm.weight", shape: (3584,), dtype: float16
13%|β–ˆβ–Ž | 25/199 [00:16<00:45, 3.83it/s] [2024-06-06 23:45:03] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.26.mlp.down_proj.weight", shape: (3584, 18944), dtype: float16
13%|β–ˆβ–Ž | 25/199 [00:16<00:45, 3.83it/s] 14%|β–ˆβ–Ž | 27/199 [00:17<00:47, 3.62it/s] [2024-06-06 23:45:04] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.26.mlp.gate_up_proj.weight", shape: (37888, 3584), dtype: float16
14%|β–ˆβ–Ž | 27/199 [00:17<00:47, 3.62it/s] 14%|β–ˆβ– | 28/199 [00:18<01:21, 2.11it/s] [2024-06-06 23:45:05] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.26.post_attention_layernorm.weight", shape: (3584,), dtype: float16
14%|β–ˆβ– | 28/199 [00:18<01:21, 2.11it/s] [2024-06-06 23:45:05] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.26.self_attn.c_attn.bias", shape: (4608,), dtype: float16
14%|β–ˆβ– | 28/199 [00:18<01:21, 2.11it/s] [2024-06-06 23:45:05] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.26.self_attn.c_attn.weight", shape: (4608, 3584), dtype: float16
14%|β–ˆβ– | 28/199 [00:18<01:21, 2.11it/s] 16%|β–ˆβ–Œ | 31/199 [00:18<00:47, 3.52it/s] [2024-06-06 23:45:05] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.26.self_attn.o_proj.weight", shape: (3584, 3584), dtype: float16
16%|β–ˆβ–Œ | 31/199 [00:18<00:47, 3.52it/s] 16%|β–ˆβ–Œ | 32/199 [00:18<00:41, 3.98it/s] [2024-06-06 23:45:05] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.27.input_layernorm.weight", shape: (3584,), dtype: float16
16%|β–ˆβ–Œ | 32/199 [00:18<00:41, 3.98it/s] [2024-06-06 23:45:05] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.27.mlp.down_proj.weight", shape: (3584, 18944), dtype: float16
16%|β–ˆβ–Œ | 32/199 [00:19<00:41, 3.98it/s] 17%|β–ˆβ–‹ | 34/199 [00:19<00:44, 3.71it/s] [2024-06-06 23:45:06] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.27.mlp.gate_up_proj.weight", shape: (37888, 3584), dtype: float16
17%|β–ˆβ–‹ | 34/199 [00:19<00:44, 3.71it/s] 18%|β–ˆβ–Š | 35/199 [00:20<01:17, 2.13it/s] [2024-06-06 23:45:07] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.27.post_attention_layernorm.weight", shape: (3584,), dtype: float16
18%|β–ˆβ–Š | 35/199 [00:20<01:17, 2.13it/s] [2024-06-06 23:45:07] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.27.self_attn.c_attn.bias", shape: (4608,), dtype: float16
18%|β–ˆβ–Š | 35/199 [00:20<01:17, 2.13it/s] [2024-06-06 23:45:07] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.27.self_attn.c_attn.weight", shape: (4608, 3584), dtype: float16
18%|β–ˆβ–Š | 35/199 [00:20<01:17, 2.13it/s] 19%|β–ˆβ–‰ | 38/199 [00:20<00:45, 3.54it/s] [2024-06-06 23:45:07] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.27.self_attn.o_proj.weight", shape: (3584, 3584), dtype: float16
19%|β–ˆβ–‰ | 38/199 [00:20<00:45, 3.54it/s] 20%|β–ˆβ–‰ | 39/199 [00:21<00:40, 3.99it/s] [2024-06-06 23:45:07] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.norm.weight", shape: (3584,), dtype: float16
20%|β–ˆβ–‰ | 39/199 [00:21<00:40, 3.99it/s] [2024-06-06 23:45:07] INFO huggingface_loader.py:197: Unloading HF weight file: /models/Qwen2-7B-Instruct/model-00004-of-00004.safetensors
20%|β–ˆβ–‰ | 39/199 [00:21<00:40, 3.99it/s] [2024-06-06 23:45:08] INFO huggingface_loader.py:185: Loading HF parameters from: /models/Qwen2-7B-Instruct/model-00001-of-00004.safetensors
20%|β–ˆβ–‰ | 39/199 [00:21<00:40, 3.99it/s] [2024-06-06 23:45:14] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.embed_tokens.weight", shape: (152064, 3584), dtype: float16
20%|β–ˆβ–‰ | 39/199 [00:27<00:40, 3.99it/s] 21%|β–ˆβ–ˆ | 41/199 [00:30<04:34, 1.74s/it] [2024-06-06 23:45:17] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.0.input_layernorm.weight", shape: (3584,), dtype: float16
21%|β–ˆβ–ˆ | 41/199 [00:30<04:34, 1.74s/it] [2024-06-06 23:45:17] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.0.mlp.down_proj.weight", shape: (3584, 18944), dtype: float16
21%|β–ˆβ–ˆ | 41/199 [00:30<04:34, 1.74s/it] 22%|β–ˆβ–ˆβ– | 43/199 [00:31<03:21, 1.29s/it] [2024-06-06 23:45:18] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.0.mlp.gate_up_proj.weight", shape: (37888, 3584), dtype: float16
22%|β–ˆβ–ˆβ– | 43/199 [00:31<03:21, 1.29s/it] 22%|β–ˆβ–ˆβ– | 44/199 [00:32<03:20, 1.29s/it] [2024-06-06 23:45:19] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.0.post_attention_layernorm.weight", shape: (3584,), dtype: float16
22%|β–ˆβ–ˆβ– | 44/199 [00:32<03:20, 1.29s/it] [2024-06-06 23:45:19] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.0.self_attn.c_attn.bias", shape: (4608,), dtype: float16
22%|β–ˆβ–ˆβ– | 44/199 [00:32<03:20, 1.29s/it] [2024-06-06 23:45:19] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.0.self_attn.c_attn.weight", shape: (4608, 3584), dtype: float16
22%|β–ˆβ–ˆβ– | 44/199 [00:32<03:20, 1.29s/it] 24%|β–ˆβ–ˆβ–Ž | 47/199 [00:32<01:52, 1.35it/s] [2024-06-06 23:45:19] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.0.self_attn.o_proj.weight", shape: (3584, 3584), dtype: float16
24%|β–ˆβ–ˆβ–Ž | 47/199 [00:32<01:52, 1.35it/s] 24%|β–ˆβ–ˆβ– | 48/199 [00:32<01:35, 1.58it/s] [2024-06-06 23:45:19] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.1.input_layernorm.weight", shape: (3584,), dtype: float16
24%|β–ˆβ–ˆβ– | 48/199 [00:32<01:35, 1.58it/s] [2024-06-06 23:45:19] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.1.mlp.down_proj.weight", shape: (3584, 18944), dtype: float16
24%|β–ˆβ–ˆβ– | 48/199 [00:32<01:35, 1.58it/s] 25%|β–ˆβ–ˆβ–Œ | 50/199 [00:33<01:18, 1.90it/s] [2024-06-06 23:45:20] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.1.mlp.gate_up_proj.weight", shape: (37888, 3584), dtype: float16
25%|β–ˆβ–ˆβ–Œ | 50/199 [00:33<01:18, 1.90it/s] 26%|β–ˆβ–ˆβ–Œ | 51/199 [00:34<01:41, 1.46it/s] [2024-06-06 23:45:21] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.1.post_attention_layernorm.weight", shape: (3584,), dtype: float16
26%|β–ˆβ–ˆβ–Œ | 51/199 [00:34<01:41, 1.46it/s] [2024-06-06 23:45:21] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.1.self_attn.c_attn.bias", shape: (4608,), dtype: float16
26%|β–ˆβ–ˆβ–Œ | 51/199 [00:34<01:41, 1.46it/s] [2024-06-06 23:45:21] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.1.self_attn.c_attn.weight", shape: (4608, 3584), dtype: float16
26%|β–ˆβ–ˆβ–Œ | 51/199 [00:34<01:41, 1.46it/s] 27%|β–ˆβ–ˆβ–‹ | 54/199 [00:34<00:57, 2.50it/s] [2024-06-06 23:45:21] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.1.self_attn.o_proj.weight", shape: (3584, 3584), dtype: float16
27%|β–ˆβ–ˆβ–‹ | 54/199 [00:34<00:57, 2.50it/s] 28%|β–ˆβ–ˆβ–Š | 55/199 [00:34<00:49, 2.88it/s] [2024-06-06 23:45:21] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.2.input_layernorm.weight", shape: (3584,), dtype: float16
28%|β–ˆβ–ˆβ–Š | 55/199 [00:34<00:49, 2.88it/s] [2024-06-06 23:45:22] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.2.mlp.down_proj.weight", shape: (3584, 18944), dtype: float16
28%|β–ˆβ–ˆβ–Š | 55/199 [00:35<00:49, 2.88it/s] 29%|β–ˆβ–ˆβ–Š | 57/199 [00:35<00:47, 3.01it/s] [2024-06-06 23:45:22] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.2.mlp.gate_up_proj.weight", shape: (37888, 3584), dtype: float16
29%|β–ˆβ–ˆβ–Š | 57/199 [00:36<00:47, 3.01it/s] 29%|β–ˆβ–ˆβ–‰ | 58/199 [00:36<01:12, 1.94it/s] [2024-06-06 23:45:23] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.2.post_attention_layernorm.weight", shape: (3584,), dtype: float16
29%|β–ˆβ–ˆβ–‰ | 58/199 [00:36<01:12, 1.94it/s] [2024-06-06 23:45:23] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.2.self_attn.c_attn.bias", shape: (4608,), dtype: float16
29%|β–ˆβ–ˆβ–‰ | 58/199 [00:36<01:12, 1.94it/s] [2024-06-06 23:45:23] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.2.self_attn.c_attn.weight", shape: (4608, 3584), dtype: float16
29%|β–ˆβ–ˆβ–‰ | 58/199 [00:36<01:12, 1.94it/s] 31%|β–ˆβ–ˆβ–ˆ | 61/199 [00:36<00:42, 3.27it/s] [2024-06-06 23:45:23] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.2.self_attn.o_proj.weight", shape: (3584, 3584), dtype: float16
31%|β–ˆβ–ˆβ–ˆ | 61/199 [00:37<00:42, 3.27it/s] 31%|β–ˆβ–ˆβ–ˆ | 62/199 [00:37<00:37, 3.70it/s] [2024-06-06 23:45:23] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.3.input_layernorm.weight", shape: (3584,), dtype: float16
31%|β–ˆβ–ˆβ–ˆ | 62/199 [00:37<00:37, 3.70it/s] [2024-06-06 23:45:24] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.3.mlp.down_proj.weight", shape: (3584, 18944), dtype: float16
31%|β–ˆβ–ˆβ–ˆ | 62/199 [00:37<00:37, 3.70it/s] 32%|β–ˆβ–ˆβ–ˆβ– | 64/199 [00:37<00:38, 3.53it/s] [2024-06-06 23:45:25] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.3.mlp.gate_up_proj.weight", shape: (37888, 3584), dtype: float16
32%|β–ˆβ–ˆβ–ˆβ– | 64/199 [00:38<00:38, 3.53it/s] 33%|β–ˆβ–ˆβ–ˆβ–Ž | 65/199 [00:39<01:11, 1.87it/s] [2024-06-06 23:45:26] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.3.post_attention_layernorm.weight", shape: (3584,), dtype: float16
33%|β–ˆβ–ˆβ–ˆβ–Ž | 65/199 [00:39<01:11, 1.87it/s] [2024-06-06 23:45:26] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.3.self_attn.c_attn.bias", shape: (4608,), dtype: float16
33%|β–ˆβ–ˆβ–ˆβ–Ž | 65/199 [00:39<01:11, 1.87it/s] [2024-06-06 23:45:26] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.3.self_attn.c_attn.weight", shape: (4608, 3584), dtype: float16
33%|β–ˆβ–ˆβ–ˆβ–Ž | 65/199 [00:39<01:11, 1.87it/s] 34%|β–ˆβ–ˆβ–ˆβ– | 68/199 [00:39<00:41, 3.16it/s] [2024-06-06 23:45:26] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.3.self_attn.o_proj.weight", shape: (3584, 3584), dtype: float16
34%|β–ˆβ–ˆβ–ˆβ– | 68/199 [00:39<00:41, 3.16it/s] 35%|β–ˆβ–ˆβ–ˆβ– | 69/199 [00:39<00:36, 3.59it/s] [2024-06-06 23:45:26] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.4.input_layernorm.weight", shape: (3584,), dtype: float16
35%|β–ˆβ–ˆβ–ˆβ– | 69/199 [00:39<00:36, 3.59it/s] [2024-06-06 23:45:26] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.4.mlp.down_proj.weight", shape: (3584, 18944), dtype: float16
35%|β–ˆβ–ˆβ–ˆβ– | 69/199 [00:39<00:36, 3.59it/s] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 71/199 [00:40<00:37, 3.45it/s] [2024-06-06 23:45:28] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.4.mlp.gate_up_proj.weight", shape: (37888, 3584), dtype: float16
36%|β–ˆβ–ˆβ–ˆβ–Œ | 71/199 [00:41<00:37, 3.45it/s] 36%|β–ˆβ–ˆβ–ˆβ–Œ | 72/199 [00:42<01:29, 1.42it/s] [2024-06-06 23:45:29] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.4.post_attention_layernorm.weight", shape: (3584,), dtype: float16
36%|β–ˆβ–ˆβ–ˆβ–Œ | 72/199 [00:42<01:29, 1.42it/s] [2024-06-06 23:45:29] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.4.self_attn.c_attn.bias", shape: (4608,), dtype: float16
36%|β–ˆβ–ˆβ–ˆβ–Œ | 72/199 [00:42<01:29, 1.42it/s] [2024-06-06 23:45:29] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.4.self_attn.c_attn.weight", shape: (4608, 3584), dtype: float16
36%|β–ˆβ–ˆβ–ˆβ–Œ | 72/199 [00:42<01:29, 1.42it/s] 38%|β–ˆβ–ˆβ–ˆβ–Š | 75/199 [00:42<00:50, 2.44it/s] [2024-06-06 23:45:29] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.4.self_attn.o_proj.weight", shape: (3584, 3584), dtype: float16
38%|β–ˆβ–ˆβ–ˆβ–Š | 75/199 [00:42<00:50, 2.44it/s] 38%|β–ˆβ–ˆβ–ˆβ–Š | 76/199 [00:42<00:43, 2.81it/s] [2024-06-06 23:45:29] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.5.input_layernorm.weight", shape: (3584,), dtype: float16
38%|β–ˆβ–ˆβ–ˆβ–Š | 76/199 [00:42<00:43, 2.81it/s] [2024-06-06 23:45:29] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.5.mlp.down_proj.weight", shape: (3584, 18944), dtype: float16
38%|β–ˆβ–ˆβ–ˆβ–Š | 76/199 [00:43<00:43, 2.81it/s] 39%|β–ˆβ–ˆβ–ˆβ–‰ | 78/199 [00:43<00:40, 2.95it/s] [2024-06-06 23:45:32] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.5.mlp.gate_up_proj.weight", shape: (37888, 3584), dtype: float16
39%|β–ˆβ–ˆβ–ˆβ–‰ | 78/199 [00:45<00:40, 2.95it/s] 40%|β–ˆβ–ˆβ–ˆβ–‰ | 79/199 [00:45<01:29, 1.34it/s] [2024-06-06 23:45:32] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.5.post_attention_layernorm.weight", shape: (3584,), dtype: float16
40%|β–ˆβ–ˆβ–ˆβ–‰ | 79/199 [00:45<01:29, 1.34it/s] [2024-06-06 23:45:32] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.5.self_attn.c_attn.bias", shape: (4608,), dtype: float16
40%|β–ˆβ–ˆβ–ˆβ–‰ | 79/199 [00:45<01:29, 1.34it/s] [2024-06-06 23:45:32] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.5.self_attn.c_attn.weight", shape: (4608, 3584), dtype: float16
40%|β–ˆβ–ˆβ–ˆβ–‰ | 79/199 [00:46<01:29, 1.34it/s] 41%|β–ˆβ–ˆβ–ˆβ–ˆ | 82/199 [00:46<00:50, 2.30it/s] [2024-06-06 23:45:33] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.5.self_attn.o_proj.weight", shape: (3584, 3584), dtype: float16
41%|β–ˆβ–ˆβ–ˆβ–ˆ | 82/199 [00:46<00:50, 2.30it/s] 42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 83/199 [00:46<00:43, 2.66it/s] [2024-06-06 23:45:33] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.6.self_attn.c_attn.bias", shape: (4608,), dtype: float16
42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 83/199 [00:46<00:43, 2.66it/s] [2024-06-06 23:45:33] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.6.self_attn.c_attn.weight", shape: (4608, 3584), dtype: float16
42%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 83/199 [00:46<00:43, 2.66it/s] 43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 85/199 [00:46<00:31, 3.65it/s] [2024-06-06 23:45:33] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.6.self_attn.o_proj.weight", shape: (3584, 3584), dtype: float16
43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 85/199 [00:46<00:31, 3.65it/s] [2024-06-06 23:45:33] INFO huggingface_loader.py:197: Unloading HF weight file: /models/Qwen2-7B-Instruct/model-00001-of-00004.safetensors
43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 85/199 [00:46<00:31, 3.65it/s] [2024-06-06 23:45:33] INFO huggingface_loader.py:185: Loading HF parameters from: /models/Qwen2-7B-Instruct/model-00002-of-00004.safetensors
43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 85/199 [00:46<00:31, 3.65it/s] [2024-06-06 23:45:37] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.10.input_layernorm.weight", shape: (3584,), dtype: float16
43%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 85/199 [00:50<00:31, 3.65it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 87/199 [00:50<01:40, 1.12it/s] [2024-06-06 23:45:37] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.10.mlp.down_proj.weight", shape: (3584, 18944), dtype: float16
44%|β–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 87/199 [00:50<01:40, 1.12it/s] 44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 88/199 [00:51<01:33, 1.19it/s] [2024-06-06 23:45:39] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.10.mlp.gate_up_proj.weight", shape: (37888, 3584), dtype: float16
44%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 88/199 [00:52<01:33, 1.19it/s] 45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 89/199 [00:53<01:55, 1.05s/it] [2024-06-06 23:45:40] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.10.post_attention_layernorm.weight", shape: (3584,), dtype: float16
45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 89/199 [00:53<01:55, 1.05s/it] [2024-06-06 23:45:40] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.10.self_attn.c_attn.bias", shape: (4608,), dtype: float16
45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 89/199 [00:53<01:55, 1.05s/it] [2024-06-06 23:45:40] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.10.self_attn.c_attn.weight", shape: (4608, 3584), dtype: float16
45%|β–ˆβ–ˆβ–ˆβ–ˆβ– | 89/199 [00:53<01:55, 1.05s/it] 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 92/199 [00:53<01:01, 1.73it/s] [2024-06-06 23:45:40] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.10.self_attn.o_proj.weight", shape: (3584, 3584), dtype: float16
46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 92/199 [00:53<01:01, 1.73it/s] 47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 93/199 [00:53<00:51, 2.05it/s] [2024-06-06 23:45:40] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.11.input_layernorm.weight", shape: (3584,), dtype: float16
47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 93/199 [00:53<00:51, 2.05it/s] [2024-06-06 23:45:40] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.11.mlp.down_proj.weight", shape: (3584, 18944), dtype: float16
47%|β–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 93/199 [00:53<00:51, 2.05it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 95/199 [00:54<00:44, 2.35it/s] [2024-06-06 23:45:41] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.11.mlp.gate_up_proj.weight", shape: (37888, 3584), dtype: float16
48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 95/199 [00:54<00:44, 2.35it/s] 48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 96/199 [00:55<01:02, 1.65it/s] [2024-06-06 23:45:42] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.11.post_attention_layernorm.weight", shape: (3584,), dtype: float16
48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 96/199 [00:55<01:02, 1.65it/s] [2024-06-06 23:45:42] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.11.self_attn.c_attn.bias", shape: (4608,), dtype: float16
48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 96/199 [00:55<01:02, 1.65it/s] [2024-06-06 23:45:42] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.11.self_attn.c_attn.weight", shape: (4608, 3584), dtype: float16
48%|β–ˆβ–ˆβ–ˆβ–ˆβ–Š | 96/199 [00:55<01:02, 1.65it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 99/199 [00:55<00:35, 2.83it/s] [2024-06-06 23:45:42] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.11.self_attn.o_proj.weight", shape: (3584, 3584), dtype: float16
50%|β–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 99/199 [00:55<00:35, 2.83it/s] 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 100/199 [00:55<00:30, 3.24it/s] [2024-06-06 23:45:42] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.12.input_layernorm.weight", shape: (3584,), dtype: float16
50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 100/199 [00:55<00:30, 3.24it/s] [2024-06-06 23:45:42] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.12.mlp.down_proj.weight", shape: (3584, 18944), dtype: float16
50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 100/199 [00:55<00:30, 3.24it/s] 51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 102/199 [00:56<00:29, 3.24it/s] [2024-06-06 23:45:43] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.12.mlp.gate_up_proj.weight", shape: (37888, 3584), dtype: float16
51%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 102/199 [00:56<00:29, 3.24it/s] 52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 103/199 [00:57<00:50, 1.91it/s] [2024-06-06 23:45:44] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.12.post_attention_layernorm.weight", shape: (3584,), dtype: float16
52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 103/199 [00:57<00:50, 1.91it/s] [2024-06-06 23:45:44] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.12.self_attn.c_attn.bias", shape: (4608,), dtype: float16
52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 103/199 [00:57<00:50, 1.91it/s] [2024-06-06 23:45:44] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.12.self_attn.c_attn.weight", shape: (4608, 3584), dtype: float16
52%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 103/199 [00:57<00:50, 1.91it/s] 53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 106/199 [00:57<00:28, 3.22it/s] [2024-06-06 23:45:44] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.12.self_attn.o_proj.weight", shape: (3584, 3584), dtype: float16
53%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 106/199 [00:57<00:28, 3.22it/s] 54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 107/199 [00:57<00:25, 3.65it/s] [2024-06-06 23:45:44] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.13.input_layernorm.weight", shape: (3584,), dtype: float16
54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 107/199 [00:57<00:25, 3.65it/s] [2024-06-06 23:45:45] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.13.mlp.down_proj.weight", shape: (3584, 18944), dtype: float16
54%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 107/199 [00:58<00:25, 3.65it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 109/199 [00:58<00:25, 3.52it/s] [2024-06-06 23:45:47] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.13.mlp.gate_up_proj.weight", shape: (37888, 3584), dtype: float16
55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 109/199 [01:00<00:25, 3.52it/s] 55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 110/199 [01:00<01:02, 1.42it/s] [2024-06-06 23:45:47] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.13.post_attention_layernorm.weight", shape: (3584,), dtype: float16
55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 110/199 [01:00<01:02, 1.42it/s] [2024-06-06 23:45:47] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.13.self_attn.c_attn.bias", shape: (4608,), dtype: float16
55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 110/199 [01:01<01:02, 1.42it/s] [2024-06-06 23:45:47] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.13.self_attn.c_attn.weight", shape: (4608, 3584), dtype: float16
55%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 110/199 [01:01<01:02, 1.42it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 113/199 [01:01<00:35, 2.44it/s] [2024-06-06 23:45:48] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.13.self_attn.o_proj.weight", shape: (3584, 3584), dtype: float16
57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 113/199 [01:01<00:35, 2.44it/s] 57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 114/199 [01:01<00:30, 2.82it/s] [2024-06-06 23:45:48] INFO huggingface_loader.py:185: Loading HF parameters from: /models/Qwen2-7B-Instruct/model-00003-of-00004.safetensors
57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 114/199 [01:01<00:30, 2.82it/s] [2024-06-06 23:45:53] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.14.mlp.gate_up_proj.weight", shape: (37888, 3584), dtype: float16
57%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 114/199 [01:06<00:30, 2.82it/s] 58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 115/199 [01:07<02:02, 1.45s/it] [2024-06-06 23:45:53] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.14.self_attn.c_attn.bias", shape: (4608,), dtype: float16
58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 115/199 [01:07<02:02, 1.45s/it] [2024-06-06 23:45:54] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.14.self_attn.c_attn.weight", shape: (4608, 3584), dtype: float16
58%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 115/199 [01:07<02:02, 1.45s/it] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 117/199 [01:07<01:18, 1.05it/s] [2024-06-06 23:45:54] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.14.self_attn.o_proj.weight", shape: (3584, 3584), dtype: float16
59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 117/199 [01:07<01:18, 1.05it/s] 59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 118/199 [01:07<01:03, 1.28it/s] [2024-06-06 23:45:54] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.6.input_layernorm.weight", shape: (3584,), dtype: float16
59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 118/199 [01:07<01:03, 1.28it/s] [2024-06-06 23:45:54] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.6.mlp.down_proj.weight", shape: (3584, 18944), dtype: float16
59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 118/199 [01:07<01:03, 1.28it/s] 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 120/199 [01:08<00:50, 1.57it/s] [2024-06-06 23:45:56] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.6.mlp.gate_up_proj.weight", shape: (37888, 3584), dtype: float16
60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 120/199 [01:09<00:50, 1.57it/s] 61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 121/199 [01:10<01:15, 1.03it/s] [2024-06-06 23:45:57] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.6.post_attention_layernorm.weight", shape: (3584,), dtype: float16
61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 121/199 [01:10<01:15, 1.03it/s] [2024-06-06 23:45:57] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.7.input_layernorm.weight", shape: (3584,), dtype: float16
61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 121/199 [01:10<01:15, 1.03it/s] [2024-06-06 23:45:57] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.7.mlp.down_proj.weight", shape: (3584, 18944), dtype: float16
61%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 121/199 [01:10<01:15, 1.03it/s] 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 124/199 [01:11<00:45, 1.63it/s] [2024-06-06 23:45:59] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.7.mlp.gate_up_proj.weight", shape: (37888, 3584), dtype: float16
62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 124/199 [01:12<00:45, 1.63it/s] 63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 125/199 [01:13<01:07, 1.09it/s] [2024-06-06 23:46:00] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.7.post_attention_layernorm.weight", shape: (3584,), dtype: float16
63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 125/199 [01:13<01:07, 1.09it/s] [2024-06-06 23:46:00] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.7.self_attn.c_attn.bias", shape: (4608,), dtype: float16
63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 125/199 [01:13<01:07, 1.09it/s] [2024-06-06 23:46:00] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.7.self_attn.c_attn.weight", shape: (4608, 3584), dtype: float16
63%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 125/199 [01:13<01:07, 1.09it/s] 64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 128/199 [01:13<00:38, 1.86it/s] [2024-06-06 23:46:00] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.7.self_attn.o_proj.weight", shape: (3584, 3584), dtype: float16
64%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 128/199 [01:13<00:38, 1.86it/s] 65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 129/199 [01:13<00:32, 2.17it/s] [2024-06-06 23:46:00] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.8.input_layernorm.weight", shape: (3584,), dtype: float16
65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 129/199 [01:13<00:32, 2.17it/s] [2024-06-06 23:46:00] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.8.mlp.down_proj.weight", shape: (3584, 18944), dtype: float16
65%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 129/199 [01:13<00:32, 2.17it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 131/199 [01:14<00:27, 2.45it/s] [2024-06-06 23:46:02] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.8.mlp.gate_up_proj.weight", shape: (37888, 3584), dtype: float16
66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 131/199 [01:15<00:27, 2.45it/s] 66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 132/199 [01:16<00:51, 1.31it/s] [2024-06-06 23:46:03] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.8.post_attention_layernorm.weight", shape: (3584,), dtype: float16
66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 132/199 [01:16<00:51, 1.31it/s] [2024-06-06 23:46:03] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.8.self_attn.c_attn.bias", shape: (4608,), dtype: float16
66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 132/199 [01:16<00:51, 1.31it/s] [2024-06-06 23:46:03] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.8.self_attn.c_attn.weight", shape: (4608, 3584), dtype: float16
66%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 132/199 [01:16<00:51, 1.31it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 135/199 [01:16<00:28, 2.25it/s] [2024-06-06 23:46:03] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.8.self_attn.o_proj.weight", shape: (3584, 3584), dtype: float16
68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 135/199 [01:16<00:28, 2.25it/s] 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 136/199 [01:16<00:24, 2.60it/s] [2024-06-06 23:46:03] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.9.input_layernorm.weight", shape: (3584,), dtype: float16
68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 136/199 [01:16<00:24, 2.60it/s] [2024-06-06 23:46:03] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.9.mlp.down_proj.weight", shape: (3584, 18944), dtype: float16
68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 136/199 [01:16<00:24, 2.60it/s] 69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 138/199 [01:17<00:21, 2.79it/s] [2024-06-06 23:46:05] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.9.mlp.gate_up_proj.weight", shape: (37888, 3584), dtype: float16
69%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 138/199 [01:18<00:21, 2.79it/s] 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 139/199 [01:19<00:40, 1.48it/s] [2024-06-06 23:46:06] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.9.post_attention_layernorm.weight", shape: (3584,), dtype: float16
70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 139/199 [01:19<00:40, 1.48it/s] [2024-06-06 23:46:06] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.9.self_attn.c_attn.bias", shape: (4608,), dtype: float16
70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 139/199 [01:19<00:40, 1.48it/s] [2024-06-06 23:46:06] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.9.self_attn.c_attn.weight", shape: (4608, 3584), dtype: float16
70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 139/199 [01:19<00:40, 1.48it/s] 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 142/199 [01:19<00:22, 2.52it/s] [2024-06-06 23:46:06] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.9.self_attn.o_proj.weight", shape: (3584, 3584), dtype: float16
71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 142/199 [01:19<00:22, 2.52it/s] 72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 143/199 [01:19<00:19, 2.91it/s] [2024-06-06 23:46:06] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.14.input_layernorm.weight", shape: (3584,), dtype: float16
72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 143/199 [01:19<00:19, 2.91it/s] [2024-06-06 23:46:06] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.14.mlp.down_proj.weight", shape: (3584, 18944), dtype: float16
72%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 143/199 [01:19<00:19, 2.91it/s] 73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 145/199 [01:20<00:17, 3.04it/s] [2024-06-06 23:46:07] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.14.post_attention_layernorm.weight", shape: (3584,), dtype: float16
73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 145/199 [01:20<00:17, 3.04it/s] [2024-06-06 23:46:07] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.15.input_layernorm.weight", shape: (3584,), dtype: float16
73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 145/199 [01:20<00:17, 3.04it/s] [2024-06-06 23:46:07] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.15.mlp.down_proj.weight", shape: (3584, 18944), dtype: float16
73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 145/199 [01:20<00:17, 3.04it/s] 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 148/199 [01:20<00:14, 3.62it/s] [2024-06-06 23:46:09] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.15.mlp.gate_up_proj.weight", shape: (37888, 3584), dtype: float16
74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 148/199 [01:22<00:14, 3.62it/s] 75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 149/199 [01:23<00:32, 1.56it/s] [2024-06-06 23:46:10] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.15.post_attention_layernorm.weight", shape: (3584,), dtype: float16
75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 149/199 [01:23<00:32, 1.56it/s] [2024-06-06 23:46:10] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.15.self_attn.c_attn.bias", shape: (4608,), dtype: float16
75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 149/199 [01:23<00:32, 1.56it/s] [2024-06-06 23:46:10] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.15.self_attn.c_attn.weight", shape: (4608, 3584), dtype: float16
75%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 149/199 [01:23<00:32, 1.56it/s] 76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 152/199 [01:23<00:18, 2.51it/s] [2024-06-06 23:46:10] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.15.self_attn.o_proj.weight", shape: (3584, 3584), dtype: float16
76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 152/199 [01:23<00:18, 2.51it/s] 77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 153/199 [01:23<00:16, 2.86it/s] [2024-06-06 23:46:10] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.16.input_layernorm.weight", shape: (3584,), dtype: float16
77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 153/199 [01:23<00:16, 2.86it/s] [2024-06-06 23:46:10] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.16.mlp.down_proj.weight", shape: (3584, 18944), dtype: float16
77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 153/199 [01:23<00:16, 2.86it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 155/199 [01:24<00:14, 2.96it/s] [2024-06-06 23:46:12] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.16.mlp.gate_up_proj.weight", shape: (37888, 3584), dtype: float16
78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 155/199 [01:25<00:14, 2.96it/s] 78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 156/199 [01:26<00:29, 1.45it/s] [2024-06-06 23:46:13] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.16.post_attention_layernorm.weight", shape: (3584,), dtype: float16
78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 156/199 [01:26<00:29, 1.45it/s] [2024-06-06 23:46:13] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.16.self_attn.c_attn.bias", shape: (4608,), dtype: float16
78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 156/199 [01:26<00:29, 1.45it/s] [2024-06-06 23:46:13] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.16.self_attn.c_attn.weight", shape: (4608, 3584), dtype: float16
78%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 156/199 [01:26<00:29, 1.45it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 159/199 [01:26<00:16, 2.44it/s] [2024-06-06 23:46:13] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.16.self_attn.o_proj.weight", shape: (3584, 3584), dtype: float16
80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 159/199 [01:26<00:16, 2.44it/s] 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 160/199 [01:26<00:13, 2.80it/s] [2024-06-06 23:46:13] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.17.input_layernorm.weight", shape: (3584,), dtype: float16
80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 160/199 [01:26<00:13, 2.80it/s] [2024-06-06 23:46:13] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.17.mlp.down_proj.weight", shape: (3584, 18944), dtype: float16
80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 160/199 [01:26<00:13, 2.80it/s] 81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 162/199 [01:27<00:12, 2.93it/s] [2024-06-06 23:46:15] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.17.mlp.gate_up_proj.weight", shape: (37888, 3584), dtype: float16
81%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 162/199 [01:28<00:12, 2.93it/s] 82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 163/199 [01:29<00:25, 1.40it/s] [2024-06-06 23:46:16] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.17.post_attention_layernorm.weight", shape: (3584,), dtype: float16
82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 163/199 [01:29<00:25, 1.40it/s] [2024-06-06 23:46:16] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.17.self_attn.c_attn.bias", shape: (4608,), dtype: float16
82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 163/199 [01:29<00:25, 1.40it/s] [2024-06-06 23:46:16] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.17.self_attn.c_attn.weight", shape: (4608, 3584), dtype: float16
82%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 163/199 [01:29<00:25, 1.40it/s] 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 166/199 [01:29<00:13, 2.39it/s] [2024-06-06 23:46:16] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.17.self_attn.o_proj.weight", shape: (3584, 3584), dtype: float16
83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 166/199 [01:29<00:13, 2.39it/s] 84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 167/199 [01:29<00:11, 2.76it/s] [2024-06-06 23:46:16] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.18.input_layernorm.weight", shape: (3584,), dtype: float16
84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 167/199 [01:29<00:11, 2.76it/s] [2024-06-06 23:46:16] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.18.mlp.down_proj.weight", shape: (3584, 18944), dtype: float16
84%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 167/199 [01:30<00:11, 2.76it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 169/199 [01:30<00:10, 2.94it/s] [2024-06-06 23:46:18] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.18.mlp.gate_up_proj.weight", shape: (37888, 3584), dtype: float16
85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 169/199 [01:31<00:10, 2.94it/s] 85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 170/199 [01:32<00:19, 1.49it/s] [2024-06-06 23:46:19] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.18.post_attention_layernorm.weight", shape: (3584,), dtype: float16
85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 170/199 [01:32<00:19, 1.49it/s] [2024-06-06 23:46:19] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.18.self_attn.c_attn.bias", shape: (4608,), dtype: float16
85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 170/199 [01:32<00:19, 1.49it/s] [2024-06-06 23:46:19] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.18.self_attn.c_attn.weight", shape: (4608, 3584), dtype: float16
85%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 170/199 [01:32<00:19, 1.49it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 173/199 [01:32<00:10, 2.54it/s] [2024-06-06 23:46:19] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.18.self_attn.o_proj.weight", shape: (3584, 3584), dtype: float16
87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 173/199 [01:32<00:10, 2.54it/s] 87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 174/199 [01:32<00:08, 2.93it/s] [2024-06-06 23:46:19] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.19.input_layernorm.weight", shape: (3584,), dtype: float16
87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 174/199 [01:32<00:08, 2.93it/s] [2024-06-06 23:46:19] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.19.mlp.down_proj.weight", shape: (3584, 18944), dtype: float16
87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 174/199 [01:33<00:08, 2.93it/s] 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 176/199 [01:33<00:07, 3.02it/s] [2024-06-06 23:46:21] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.19.mlp.gate_up_proj.weight", shape: (37888, 3584), dtype: float16
88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 176/199 [01:34<00:07, 3.02it/s] 89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 177/199 [01:35<00:13, 1.64it/s] [2024-06-06 23:46:22] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.19.post_attention_layernorm.weight", shape: (3584,), dtype: float16
89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 177/199 [01:35<00:13, 1.64it/s] [2024-06-06 23:46:22] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.19.self_attn.c_attn.bias", shape: (4608,), dtype: float16
89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 177/199 [01:35<00:13, 1.64it/s] [2024-06-06 23:46:22] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.19.self_attn.c_attn.weight", shape: (4608, 3584), dtype: float16
89%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 177/199 [01:35<00:13, 1.64it/s] 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 180/199 [01:35<00:06, 2.80it/s] [2024-06-06 23:46:22] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.19.self_attn.o_proj.weight", shape: (3584, 3584), dtype: float16
90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 180/199 [01:35<00:06, 2.80it/s] 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 181/199 [01:35<00:05, 3.21it/s] [2024-06-06 23:46:22] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.20.input_layernorm.weight", shape: (3584,), dtype: float16
91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 181/199 [01:35<00:05, 3.21it/s] [2024-06-06 23:46:22] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.20.mlp.down_proj.weight", shape: (3584, 18944), dtype: float16
91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 181/199 [01:35<00:05, 3.21it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 183/199 [01:36<00:04, 3.25it/s] [2024-06-06 23:46:24] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.20.mlp.gate_up_proj.weight", shape: (37888, 3584), dtype: float16
92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 183/199 [01:37<00:04, 3.25it/s] 92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 184/199 [01:37<00:09, 1.65it/s] [2024-06-06 23:46:24] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.20.post_attention_layernorm.weight", shape: (3584,), dtype: float16
92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 184/199 [01:37<00:09, 1.65it/s] [2024-06-06 23:46:24] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.20.self_attn.c_attn.bias", shape: (4608,), dtype: float16
92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 184/199 [01:37<00:09, 1.65it/s] [2024-06-06 23:46:24] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.20.self_attn.c_attn.weight", shape: (4608, 3584), dtype: float16
92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 184/199 [01:37<00:09, 1.65it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 187/199 [01:38<00:04, 2.82it/s] [2024-06-06 23:46:24] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.20.self_attn.o_proj.weight", shape: (3584, 3584), dtype: float16
94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 187/199 [01:38<00:04, 2.82it/s] 94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 188/199 [01:38<00:03, 3.23it/s] [2024-06-06 23:46:25] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.21.input_layernorm.weight", shape: (3584,), dtype: float16
94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 188/199 [01:38<00:03, 3.23it/s] [2024-06-06 23:46:25] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.21.mlp.down_proj.weight", shape: (3584, 18944), dtype: float16
94%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–| 188/199 [01:38<00:03, 3.23it/s] 95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 190/199 [01:38<00:02, 3.27it/s] [2024-06-06 23:46:27] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.21.mlp.gate_up_proj.weight", shape: (37888, 3584), dtype: float16
95%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 190/199 [01:40<00:02, 3.27it/s] 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 191/199 [01:40<00:05, 1.51it/s] [2024-06-06 23:46:27] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.21.post_attention_layernorm.weight", shape: (3584,), dtype: float16
96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 191/199 [01:40<00:05, 1.51it/s] [2024-06-06 23:46:27] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.21.self_attn.c_attn.bias", shape: (4608,), dtype: float16
96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 191/199 [01:40<00:05, 1.51it/s] [2024-06-06 23:46:27] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.21.self_attn.c_attn.weight", shape: (4608, 3584), dtype: float16
96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 191/199 [01:40<00:05, 1.51it/s] 97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 194/199 [01:41<00:01, 2.58it/s] [2024-06-06 23:46:27] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.21.self_attn.o_proj.weight", shape: (3584, 3584), dtype: float16
97%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 194/199 [01:41<00:01, 2.58it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 195/199 [01:41<00:01, 2.97it/s] [2024-06-06 23:46:29] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.22.mlp.gate_up_proj.weight", shape: (37888, 3584), dtype: float16
98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 195/199 [01:42<00:01, 2.97it/s] 98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 196/199 [01:43<00:01, 1.55it/s] [2024-06-06 23:46:29] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.22.self_attn.c_attn.bias", shape: (4608,), dtype: float16
98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 196/199 [01:43<00:01, 1.55it/s] [2024-06-06 23:46:29] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.22.self_attn.c_attn.weight", shape: (4608, 3584), dtype: float16
98%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 196/199 [01:43<00:01, 1.55it/s] 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 198/199 [01:43<00:00, 2.27it/s] [2024-06-06 23:46:30] INFO huggingface_loader.py:175: [Not quantized] Parameter: "model.layers.22.self_attn.o_proj.weight", shape: (3584, 3584), dtype: float16
99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 198/199 [01:43<00:00, 2.27it/s] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 199/199 [01:43<00:00, 2.70it/s] 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 199/199 [01:43<00:00, 1.93it/s]
[2024-06-06 23:46:30] INFO huggingface_loader.py:197: Unloading HF weight file: /models/Qwen2-7B-Instruct/model-00002-of-00004.safetensors
[2024-06-06 23:46:30] INFO huggingface_loader.py:197: Unloading HF weight file: /models/Qwen2-7B-Instruct/model-00003-of-00004.safetensors
[2024-06-06 23:46:31] INFO stats.py:77: Time usage: HF loading: 16.631 sec; Pre-quantization mapping: 44.010 sec; Quantization: 0.000 sec
[2024-06-06 23:46:31] INFO stats.py:91: RAM usage: Peak RAM: 14.397 GB. Total bytes loaded from disk: 28.370 GB
[2024-06-06 23:46:31] INFO convert_weight.py:155: Parameter size after quantization: 14.185 GB
[2024-06-06 23:46:31] INFO convert_weight.py:160: Total parameters: 7,615,616,512
[2024-06-06 23:46:31] INFO convert_weight.py:161: Bits per parameter: 16.000
[2024-06-06 23:46:31] INFO convert_weight.py:166: Saved to directory: /models/mlc-delivery/hf/mlc-ai/Qwen2-7B-Instruct-q0f16-MLC
All finished, 114 total shards committed, record saved to /models/mlc-delivery/hf/mlc-ai/Qwen2-7B-Instruct-q0f16-MLC/ndarray-cache.json