05/13/2024 15:44:29 - INFO - transformers.tokenization_utils_base - loading file tokenizer.model from cache at /home/featurize/.cache/huggingface/hub/models--alpindale--Mistral-7B-v0.2-hf/snapshots/2c3e624962b1a3f3fbf52e15969565caa7bc064a/tokenizer.model 05/13/2024 15:44:29 - INFO - transformers.tokenization_utils_base - loading file tokenizer.json from cache at /home/featurize/.cache/huggingface/hub/models--alpindale--Mistral-7B-v0.2-hf/snapshots/2c3e624962b1a3f3fbf52e15969565caa7bc064a/tokenizer.json 05/13/2024 15:44:29 - INFO - transformers.tokenization_utils_base - loading file added_tokens.json from cache at None 05/13/2024 15:44:29 - INFO - transformers.tokenization_utils_base - loading file special_tokens_map.json from cache at /home/featurize/.cache/huggingface/hub/models--alpindale--Mistral-7B-v0.2-hf/snapshots/2c3e624962b1a3f3fbf52e15969565caa7bc064a/special_tokens_map.json 05/13/2024 15:44:29 - INFO - transformers.tokenization_utils_base - loading file tokenizer_config.json from cache at /home/featurize/.cache/huggingface/hub/models--alpindale--Mistral-7B-v0.2-hf/snapshots/2c3e624962b1a3f3fbf52e15969565caa7bc064a/tokenizer_config.json 05/13/2024 15:44:29 - WARNING - transformers.models.llama.tokenization_llama_fast - You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers 05/13/2024 15:44:29 - INFO - llmtuner.data.template - Add pad token: 05/13/2024 15:44:29 - INFO - llmtuner.data.loader - Loading dataset svjack/dpo_zh_emoji_rj_en... 05/13/2024 15:44:36 - INFO - transformers.configuration_utils - loading configuration file config.json from cache at /home/featurize/.cache/huggingface/hub/models--alpindale--Mistral-7B-v0.2-hf/snapshots/2c3e624962b1a3f3fbf52e15969565caa7bc064a/config.json 05/13/2024 15:44:36 - INFO - transformers.configuration_utils - Model config MistralConfig { "_name_or_path": "alpindale/Mistral-7B-v0.2-hf", "architectures": [ "MistralForCausalLM" ], "attention_dropout": 0.0, "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 14336, "max_position_embeddings": 32768, "model_type": "mistral", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 8, "rms_norm_eps": 1e-05, "rope_theta": 1000000.0, "sliding_window": null, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.40.2", "use_cache": true, "vocab_size": 32000 } 05/13/2024 15:44:36 - INFO - llmtuner.model.utils.quantization - Quantizing model to 4 bit. 05/13/2024 15:44:36 - INFO - transformers.modeling_utils - loading weights file model.safetensors from cache at /home/featurize/.cache/huggingface/hub/models--alpindale--Mistral-7B-v0.2-hf/snapshots/2c3e624962b1a3f3fbf52e15969565caa7bc064a/model.safetensors.index.json 05/13/2024 15:44:36 - INFO - transformers.modeling_utils - Instantiating MistralForCausalLM model under default dtype torch.float16. 05/13/2024 15:44:36 - INFO - transformers.generation.configuration_utils - Generate config GenerationConfig { "bos_token_id": 1, "eos_token_id": 2 } 05/13/2024 15:44:45 - INFO - transformers.modeling_utils - All model checkpoint weights were used when initializing MistralForCausalLM. 05/13/2024 15:44:45 - INFO - transformers.modeling_utils - All the weights of MistralForCausalLM were initialized from the model checkpoint at alpindale/Mistral-7B-v0.2-hf. If your task is similar to the task the model of the checkpoint was trained on, you can already use MistralForCausalLM for predictions without further training. 05/13/2024 15:44:46 - INFO - transformers.generation.configuration_utils - loading configuration file generation_config.json from cache at /home/featurize/.cache/huggingface/hub/models--alpindale--Mistral-7B-v0.2-hf/snapshots/2c3e624962b1a3f3fbf52e15969565caa7bc064a/generation_config.json 05/13/2024 15:44:46 - INFO - transformers.generation.configuration_utils - Generate config GenerationConfig { "bos_token_id": 1, "eos_token_id": 2 } 05/13/2024 15:44:46 - INFO - llmtuner.model.utils.checkpointing - Gradient checkpointing enabled. 05/13/2024 15:44:46 - INFO - llmtuner.model.utils.attention - Using torch SDPA for faster training and inference. 05/13/2024 15:44:46 - INFO - llmtuner.model.adapter - Fine-tuning method: LoRA 05/13/2024 15:44:47 - INFO - llmtuner.model.loader - trainable params: 3407872 || all params: 7245139968 || trainable%: 0.0470 05/13/2024 15:44:47 - WARNING - accelerate.utils.other - Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher. 05/13/2024 15:44:47 - INFO - transformers.trainer - Using auto half precision backend 05/13/2024 15:44:47 - INFO - transformers.trainer - ***** Running training ***** 05/13/2024 15:44:47 - INFO - transformers.trainer - Num examples = 2,449 05/13/2024 15:44:47 - INFO - transformers.trainer - Num Epochs = 3 05/13/2024 15:44:47 - INFO - transformers.trainer - Instantaneous batch size per device = 1 05/13/2024 15:44:47 - INFO - transformers.trainer - Total train batch size (w. parallel, distributed & accumulation) = 8 05/13/2024 15:44:47 - INFO - transformers.trainer - Gradient Accumulation steps = 8 05/13/2024 15:44:47 - INFO - transformers.trainer - Total optimization steps = 918 05/13/2024 15:44:47 - INFO - transformers.trainer - Number of trainable parameters = 3,407,872 05/13/2024 15:45:44 - INFO - llmtuner.extras.callbacks - {'loss': 1.1780, 'learning_rate': 4.9996e-05, 'epoch': 0.02} 05/13/2024 15:46:45 - INFO - llmtuner.extras.callbacks - {'loss': 1.0927, 'learning_rate': 4.9985e-05, 'epoch': 0.03} 05/13/2024 15:47:42 - INFO - llmtuner.extras.callbacks - {'loss': 1.0993, 'learning_rate': 4.9967e-05, 'epoch': 0.05} 05/13/2024 15:48:39 - INFO - llmtuner.extras.callbacks - {'loss': 1.1402, 'learning_rate': 4.9941e-05, 'epoch': 0.07} 05/13/2024 15:49:33 - INFO - llmtuner.extras.callbacks - {'loss': 1.0937, 'learning_rate': 4.9909e-05, 'epoch': 0.08} 05/13/2024 15:50:26 - INFO - llmtuner.extras.callbacks - {'loss': 1.1745, 'learning_rate': 4.9868e-05, 'epoch': 0.10} 05/13/2024 15:51:21 - INFO - llmtuner.extras.callbacks - {'loss': 1.0336, 'learning_rate': 4.9821e-05, 'epoch': 0.11} 05/13/2024 15:52:17 - INFO - llmtuner.extras.callbacks - {'loss': 0.9642, 'learning_rate': 4.9766e-05, 'epoch': 0.13} 05/13/2024 15:53:14 - INFO - llmtuner.extras.callbacks - {'loss': 0.9699, 'learning_rate': 4.9704e-05, 'epoch': 0.15} 05/13/2024 15:54:14 - INFO - llmtuner.extras.callbacks - {'loss': 1.0223, 'learning_rate': 4.9635e-05, 'epoch': 0.16} 05/13/2024 15:55:12 - INFO - llmtuner.extras.callbacks - {'loss': 1.0148, 'learning_rate': 4.9558e-05, 'epoch': 0.18} 05/13/2024 15:56:09 - INFO - llmtuner.extras.callbacks - {'loss': 0.9509, 'learning_rate': 4.9475e-05, 'epoch': 0.20} 05/13/2024 15:57:04 - INFO - llmtuner.extras.callbacks - {'loss': 1.0147, 'learning_rate': 4.9384e-05, 'epoch': 0.21} 05/13/2024 15:57:57 - INFO - llmtuner.extras.callbacks - {'loss': 0.9836, 'learning_rate': 4.9286e-05, 'epoch': 0.23} 05/13/2024 15:58:53 - INFO - llmtuner.extras.callbacks - {'loss': 1.0454, 'learning_rate': 4.9181e-05, 'epoch': 0.24} 05/13/2024 15:59:49 - INFO - llmtuner.extras.callbacks - {'loss': 1.0294, 'learning_rate': 4.9069e-05, 'epoch': 0.26} 05/13/2024 16:00:41 - INFO - llmtuner.extras.callbacks - {'loss': 1.0032, 'learning_rate': 4.8950e-05, 'epoch': 0.28} 05/13/2024 16:01:39 - INFO - llmtuner.extras.callbacks - {'loss': 0.9995, 'learning_rate': 4.8824e-05, 'epoch': 0.29} 05/13/2024 16:02:33 - INFO - llmtuner.extras.callbacks - {'loss': 0.9903, 'learning_rate': 4.8690e-05, 'epoch': 0.31} 05/13/2024 16:03:24 - INFO - llmtuner.extras.callbacks - {'loss': 1.1427, 'learning_rate': 4.8550e-05, 'epoch': 0.33} 05/13/2024 16:03:24 - INFO - transformers.trainer - Saving model checkpoint to saves/Mistral-7B-v0.2/lora/train_2024-05-13-15-43-20/checkpoint-100 05/13/2024 16:03:26 - INFO - transformers.configuration_utils - loading configuration file config.json from cache at /home/featurize/.cache/huggingface/hub/models--alpindale--Mistral-7B-v0.2-hf/snapshots/2c3e624962b1a3f3fbf52e15969565caa7bc064a/config.json 05/13/2024 16:03:26 - INFO - transformers.configuration_utils - Model config MistralConfig { "architectures": [ "MistralForCausalLM" ], "attention_dropout": 0.0, "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 14336, "max_position_embeddings": 32768, "model_type": "mistral", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 8, "rms_norm_eps": 1e-05, "rope_theta": 1000000.0, "sliding_window": null, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.40.2", "use_cache": true, "vocab_size": 32000 } 05/13/2024 16:03:26 - INFO - transformers.tokenization_utils_base - tokenizer config file saved in saves/Mistral-7B-v0.2/lora/train_2024-05-13-15-43-20/checkpoint-100/tokenizer_config.json 05/13/2024 16:03:26 - INFO - transformers.tokenization_utils_base - Special tokens file saved in saves/Mistral-7B-v0.2/lora/train_2024-05-13-15-43-20/checkpoint-100/special_tokens_map.json 05/13/2024 16:04:30 - INFO - llmtuner.extras.callbacks - {'loss': 0.9410, 'learning_rate': 4.8403e-05, 'epoch': 0.34} 05/13/2024 16:05:29 - INFO - llmtuner.extras.callbacks - {'loss': 0.9128, 'learning_rate': 4.8249e-05, 'epoch': 0.36} 05/13/2024 16:06:23 - INFO - llmtuner.extras.callbacks - {'loss': 1.0076, 'learning_rate': 4.8089e-05, 'epoch': 0.38} 05/13/2024 16:07:20 - INFO - llmtuner.extras.callbacks - {'loss': 0.9352, 'learning_rate': 4.7921e-05, 'epoch': 0.39} 05/13/2024 16:08:13 - INFO - llmtuner.extras.callbacks - {'loss': 0.9553, 'learning_rate': 4.7747e-05, 'epoch': 0.41} 05/13/2024 16:09:08 - INFO - llmtuner.extras.callbacks - {'loss': 0.9682, 'learning_rate': 4.7566e-05, 'epoch': 0.42} 05/13/2024 16:10:07 - INFO - llmtuner.extras.callbacks - {'loss': 0.9498, 'learning_rate': 4.7379e-05, 'epoch': 0.44} 05/13/2024 16:11:03 - INFO - llmtuner.extras.callbacks - {'loss': 0.9698, 'learning_rate': 4.7185e-05, 'epoch': 0.46} 05/13/2024 16:12:02 - INFO - llmtuner.extras.callbacks - {'loss': 0.9030, 'learning_rate': 4.6985e-05, 'epoch': 0.47} 05/13/2024 16:13:02 - INFO - llmtuner.extras.callbacks - {'loss': 0.8838, 'learning_rate': 4.6778e-05, 'epoch': 0.49} 05/13/2024 16:13:55 - INFO - llmtuner.extras.callbacks - {'loss': 0.9498, 'learning_rate': 4.6565e-05, 'epoch': 0.51} 05/13/2024 16:14:57 - INFO - llmtuner.extras.callbacks - {'loss': 0.8243, 'learning_rate': 4.6345e-05, 'epoch': 0.52} 05/13/2024 16:15:54 - INFO - llmtuner.extras.callbacks - {'loss': 0.9163, 'learning_rate': 4.6119e-05, 'epoch': 0.54} 05/13/2024 16:16:49 - INFO - llmtuner.extras.callbacks - {'loss': 0.9406, 'learning_rate': 4.5887e-05, 'epoch': 0.56} 05/13/2024 16:17:49 - INFO - llmtuner.extras.callbacks - {'loss': 0.8871, 'learning_rate': 4.5649e-05, 'epoch': 0.57} 05/13/2024 16:18:43 - INFO - llmtuner.extras.callbacks - {'loss': 0.8753, 'learning_rate': 4.5405e-05, 'epoch': 0.59} 05/13/2024 16:19:44 - INFO - llmtuner.extras.callbacks - {'loss': 0.8625, 'learning_rate': 4.5155e-05, 'epoch': 0.60} 05/13/2024 16:20:40 - INFO - llmtuner.extras.callbacks - {'loss': 0.9149, 'learning_rate': 4.4899e-05, 'epoch': 0.62} 05/13/2024 16:21:35 - INFO - llmtuner.extras.callbacks - {'loss': 0.8582, 'learning_rate': 4.4637e-05, 'epoch': 0.64} 05/13/2024 16:22:33 - INFO - llmtuner.extras.callbacks - {'loss': 0.9045, 'learning_rate': 4.4369e-05, 'epoch': 0.65} 05/13/2024 16:22:33 - INFO - transformers.trainer - Saving model checkpoint to saves/Mistral-7B-v0.2/lora/train_2024-05-13-15-43-20/checkpoint-200 05/13/2024 16:22:34 - INFO - transformers.configuration_utils - loading configuration file config.json from cache at /home/featurize/.cache/huggingface/hub/models--alpindale--Mistral-7B-v0.2-hf/snapshots/2c3e624962b1a3f3fbf52e15969565caa7bc064a/config.json 05/13/2024 16:22:34 - INFO - transformers.configuration_utils - Model config MistralConfig { "architectures": [ "MistralForCausalLM" ], "attention_dropout": 0.0, "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 14336, "max_position_embeddings": 32768, "model_type": "mistral", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 8, "rms_norm_eps": 1e-05, "rope_theta": 1000000.0, "sliding_window": null, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.40.2", "use_cache": true, "vocab_size": 32000 } 05/13/2024 16:22:34 - INFO - transformers.tokenization_utils_base - tokenizer config file saved in saves/Mistral-7B-v0.2/lora/train_2024-05-13-15-43-20/checkpoint-200/tokenizer_config.json 05/13/2024 16:22:34 - INFO - transformers.tokenization_utils_base - Special tokens file saved in saves/Mistral-7B-v0.2/lora/train_2024-05-13-15-43-20/checkpoint-200/special_tokens_map.json 05/13/2024 16:23:27 - INFO - llmtuner.extras.callbacks - {'loss': 0.9416, 'learning_rate': 4.4096e-05, 'epoch': 0.67} 05/13/2024 16:24:22 - INFO - llmtuner.extras.callbacks - {'loss': 0.8609, 'learning_rate': 4.3817e-05, 'epoch': 0.69} 05/13/2024 16:25:23 - INFO - llmtuner.extras.callbacks - {'loss': 0.8230, 'learning_rate': 4.3533e-05, 'epoch': 0.70} 05/13/2024 16:26:18 - INFO - llmtuner.extras.callbacks - {'loss': 0.9693, 'learning_rate': 4.3243e-05, 'epoch': 0.72} 05/13/2024 16:27:12 - INFO - llmtuner.extras.callbacks - {'loss': 0.8783, 'learning_rate': 4.2948e-05, 'epoch': 0.73} 05/13/2024 16:28:10 - INFO - llmtuner.extras.callbacks - {'loss': 0.9411, 'learning_rate': 4.2708e-05, 'epoch': 0.75} 05/13/2024 16:29:03 - INFO - llmtuner.extras.callbacks - {'loss': 0.7944, 'learning_rate': 4.2403e-05, 'epoch': 0.77} 05/13/2024 16:30:03 - INFO - llmtuner.extras.callbacks - {'loss': 0.8013, 'learning_rate': 4.2094e-05, 'epoch': 0.78} 05/13/2024 16:30:57 - INFO - llmtuner.extras.callbacks - {'loss': 0.9298, 'learning_rate': 4.1779e-05, 'epoch': 0.80} 05/13/2024 16:31:49 - INFO - llmtuner.extras.callbacks - {'loss': 0.8988, 'learning_rate': 4.1460e-05, 'epoch': 0.82} 05/13/2024 16:32:48 - INFO - llmtuner.extras.callbacks - {'loss': 0.8251, 'learning_rate': 4.1135e-05, 'epoch': 0.83} 05/13/2024 16:33:44 - INFO - llmtuner.extras.callbacks - {'loss': 0.8772, 'learning_rate': 4.0806e-05, 'epoch': 0.85} 05/13/2024 16:34:42 - INFO - llmtuner.extras.callbacks - {'loss': 0.8414, 'learning_rate': 4.0472e-05, 'epoch': 0.87} 05/13/2024 16:35:33 - INFO - llmtuner.extras.callbacks - {'loss': 0.8685, 'learning_rate': 4.0134e-05, 'epoch': 0.88} 05/13/2024 16:36:33 - INFO - llmtuner.extras.callbacks - {'loss': 0.8720, 'learning_rate': 3.9791e-05, 'epoch': 0.90} 05/13/2024 16:37:28 - INFO - llmtuner.extras.callbacks - {'loss': 0.8318, 'learning_rate': 3.9444e-05, 'epoch': 0.91} 05/13/2024 16:38:22 - INFO - llmtuner.extras.callbacks - {'loss': 0.8921, 'learning_rate': 3.9093e-05, 'epoch': 0.93} 05/13/2024 16:39:18 - INFO - llmtuner.extras.callbacks - {'loss': 0.8751, 'learning_rate': 3.8738e-05, 'epoch': 0.95} 05/13/2024 16:40:15 - INFO - llmtuner.extras.callbacks - {'loss': 0.8956, 'learning_rate': 3.8378e-05, 'epoch': 0.96} 05/13/2024 16:41:11 - INFO - llmtuner.extras.callbacks - {'loss': 0.8730, 'learning_rate': 3.8015e-05, 'epoch': 0.98} 05/13/2024 16:41:11 - INFO - transformers.trainer - Saving model checkpoint to saves/Mistral-7B-v0.2/lora/train_2024-05-13-15-43-20/checkpoint-300 05/13/2024 16:41:13 - INFO - transformers.configuration_utils - loading configuration file config.json from cache at /home/featurize/.cache/huggingface/hub/models--alpindale--Mistral-7B-v0.2-hf/snapshots/2c3e624962b1a3f3fbf52e15969565caa7bc064a/config.json 05/13/2024 16:41:13 - INFO - transformers.configuration_utils - Model config MistralConfig { "architectures": [ "MistralForCausalLM" ], "attention_dropout": 0.0, "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 14336, "max_position_embeddings": 32768, "model_type": "mistral", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 8, "rms_norm_eps": 1e-05, "rope_theta": 1000000.0, "sliding_window": null, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.40.2", "use_cache": true, "vocab_size": 32000 } 05/13/2024 16:41:13 - INFO - transformers.tokenization_utils_base - tokenizer config file saved in saves/Mistral-7B-v0.2/lora/train_2024-05-13-15-43-20/checkpoint-300/tokenizer_config.json 05/13/2024 16:41:13 - INFO - transformers.tokenization_utils_base - Special tokens file saved in saves/Mistral-7B-v0.2/lora/train_2024-05-13-15-43-20/checkpoint-300/special_tokens_map.json 05/13/2024 16:42:09 - INFO - llmtuner.extras.callbacks - {'loss': 0.8160, 'learning_rate': 3.7648e-05, 'epoch': 1.00} 05/13/2024 16:43:03 - INFO - llmtuner.extras.callbacks - {'loss': 0.8503, 'learning_rate': 3.7277e-05, 'epoch': 1.01} 05/13/2024 16:43:58 - INFO - llmtuner.extras.callbacks - {'loss': 0.8521, 'learning_rate': 3.6903e-05, 'epoch': 1.03} 05/13/2024 16:44:49 - INFO - llmtuner.extras.callbacks - {'loss': 0.9089, 'learning_rate': 3.6525e-05, 'epoch': 1.05} 05/13/2024 16:45:45 - INFO - llmtuner.extras.callbacks - {'loss': 0.7846, 'learning_rate': 3.6143e-05, 'epoch': 1.06} 05/13/2024 16:46:40 - INFO - llmtuner.extras.callbacks - {'loss': 0.8022, 'learning_rate': 3.5759e-05, 'epoch': 1.08} 05/13/2024 16:47:37 - INFO - llmtuner.extras.callbacks - {'loss': 0.8682, 'learning_rate': 3.5371e-05, 'epoch': 1.09} 05/13/2024 16:48:34 - INFO - llmtuner.extras.callbacks - {'loss': 0.8739, 'learning_rate': 3.4980e-05, 'epoch': 1.11} 05/13/2024 16:49:36 - INFO - llmtuner.extras.callbacks - {'loss': 0.8493, 'learning_rate': 3.4587e-05, 'epoch': 1.13} 05/13/2024 16:50:33 - INFO - llmtuner.extras.callbacks - {'loss': 0.8204, 'learning_rate': 3.4190e-05, 'epoch': 1.14} 05/13/2024 16:51:28 - INFO - llmtuner.extras.callbacks - {'loss': 0.9409, 'learning_rate': 3.3791e-05, 'epoch': 1.16} 05/13/2024 16:52:23 - INFO - llmtuner.extras.callbacks - {'loss': 0.8465, 'learning_rate': 3.3390e-05, 'epoch': 1.18} 05/13/2024 16:53:21 - INFO - llmtuner.extras.callbacks - {'loss': 0.8605, 'learning_rate': 3.2985e-05, 'epoch': 1.19} 05/13/2024 16:54:21 - INFO - llmtuner.extras.callbacks - {'loss': 0.7886, 'learning_rate': 3.2579e-05, 'epoch': 1.21} 05/13/2024 16:55:15 - INFO - llmtuner.extras.callbacks - {'loss': 0.8315, 'learning_rate': 3.2170e-05, 'epoch': 1.22} 05/13/2024 16:56:11 - INFO - llmtuner.extras.callbacks - {'loss': 0.8234, 'learning_rate': 3.1759e-05, 'epoch': 1.24} 05/13/2024 16:57:10 - INFO - llmtuner.extras.callbacks - {'loss': 0.8646, 'learning_rate': 3.1346e-05, 'epoch': 1.26} 05/13/2024 16:58:05 - INFO - llmtuner.extras.callbacks - {'loss': 0.8848, 'learning_rate': 3.0932e-05, 'epoch': 1.27} 05/13/2024 16:59:02 - INFO - llmtuner.extras.callbacks - {'loss': 0.8826, 'learning_rate': 3.0515e-05, 'epoch': 1.29} 05/13/2024 17:00:00 - INFO - llmtuner.extras.callbacks - {'loss': 0.7945, 'learning_rate': 3.0097e-05, 'epoch': 1.31} 05/13/2024 17:00:00 - INFO - transformers.trainer - Saving model checkpoint to saves/Mistral-7B-v0.2/lora/train_2024-05-13-15-43-20/checkpoint-400 05/13/2024 17:00:02 - INFO - transformers.configuration_utils - loading configuration file config.json from cache at /home/featurize/.cache/huggingface/hub/models--alpindale--Mistral-7B-v0.2-hf/snapshots/2c3e624962b1a3f3fbf52e15969565caa7bc064a/config.json 05/13/2024 17:00:02 - INFO - transformers.configuration_utils - Model config MistralConfig { "architectures": [ "MistralForCausalLM" ], "attention_dropout": 0.0, "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 14336, "max_position_embeddings": 32768, "model_type": "mistral", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 8, "rms_norm_eps": 1e-05, "rope_theta": 1000000.0, "sliding_window": null, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.40.2", "use_cache": true, "vocab_size": 32000 } 05/13/2024 17:00:02 - INFO - transformers.tokenization_utils_base - tokenizer config file saved in saves/Mistral-7B-v0.2/lora/train_2024-05-13-15-43-20/checkpoint-400/tokenizer_config.json 05/13/2024 17:00:02 - INFO - transformers.tokenization_utils_base - Special tokens file saved in saves/Mistral-7B-v0.2/lora/train_2024-05-13-15-43-20/checkpoint-400/special_tokens_map.json 05/13/2024 17:00:59 - INFO - llmtuner.extras.callbacks - {'loss': 0.9316, 'learning_rate': 2.9678e-05, 'epoch': 1.32} 05/13/2024 17:01:57 - INFO - llmtuner.extras.callbacks - {'loss': 0.8675, 'learning_rate': 2.9257e-05, 'epoch': 1.34} 05/13/2024 17:02:52 - INFO - llmtuner.extras.callbacks - {'loss': 0.8702, 'learning_rate': 2.8835e-05, 'epoch': 1.36} 05/13/2024 17:03:49 - INFO - llmtuner.extras.callbacks - {'loss': 0.8254, 'learning_rate': 2.8412e-05, 'epoch': 1.37} 05/13/2024 17:04:45 - INFO - llmtuner.extras.callbacks - {'loss': 0.8370, 'learning_rate': 2.7987e-05, 'epoch': 1.39} 05/13/2024 17:05:43 - INFO - llmtuner.extras.callbacks - {'loss': 0.7859, 'learning_rate': 2.7562e-05, 'epoch': 1.40} 05/13/2024 17:06:39 - INFO - llmtuner.extras.callbacks - {'loss': 0.8356, 'learning_rate': 2.7136e-05, 'epoch': 1.42} 05/13/2024 17:07:34 - INFO - llmtuner.extras.callbacks - {'loss': 0.8450, 'learning_rate': 2.6710e-05, 'epoch': 1.44} 05/13/2024 17:08:32 - INFO - llmtuner.extras.callbacks - {'loss': 0.7817, 'learning_rate': 2.6283e-05, 'epoch': 1.45} 05/13/2024 17:09:34 - INFO - llmtuner.extras.callbacks - {'loss': 0.8087, 'learning_rate': 2.5855e-05, 'epoch': 1.47} 05/13/2024 17:10:26 - INFO - llmtuner.extras.callbacks - {'loss': 0.8187, 'learning_rate': 2.5428e-05, 'epoch': 1.49} 05/13/2024 17:11:21 - INFO - llmtuner.extras.callbacks - {'loss': 0.8231, 'learning_rate': 2.5000e-05, 'epoch': 1.50} 05/13/2024 17:12:18 - INFO - llmtuner.extras.callbacks - {'loss': 0.8665, 'learning_rate': 2.4572e-05, 'epoch': 1.52} 05/13/2024 17:13:13 - INFO - llmtuner.extras.callbacks - {'loss': 0.8602, 'learning_rate': 2.4145e-05, 'epoch': 1.54} 05/13/2024 17:14:06 - INFO - llmtuner.extras.callbacks - {'loss': 0.8373, 'learning_rate': 2.3717e-05, 'epoch': 1.55} 05/13/2024 17:15:05 - INFO - llmtuner.extras.callbacks - {'loss': 0.8408, 'learning_rate': 2.3290e-05, 'epoch': 1.57} 05/13/2024 17:15:59 - INFO - llmtuner.extras.callbacks - {'loss': 0.8608, 'learning_rate': 2.2864e-05, 'epoch': 1.58} 05/13/2024 17:16:54 - INFO - llmtuner.extras.callbacks - {'loss': 0.8022, 'learning_rate': 2.2438e-05, 'epoch': 1.60} 05/13/2024 17:17:50 - INFO - llmtuner.extras.callbacks - {'loss': 0.8377, 'learning_rate': 2.2013e-05, 'epoch': 1.62} 05/13/2024 17:18:45 - INFO - llmtuner.extras.callbacks - {'loss': 0.7821, 'learning_rate': 2.1588e-05, 'epoch': 1.63} 05/13/2024 17:18:45 - INFO - transformers.trainer - Saving model checkpoint to saves/Mistral-7B-v0.2/lora/train_2024-05-13-15-43-20/checkpoint-500 05/13/2024 17:18:46 - INFO - transformers.configuration_utils - loading configuration file config.json from cache at /home/featurize/.cache/huggingface/hub/models--alpindale--Mistral-7B-v0.2-hf/snapshots/2c3e624962b1a3f3fbf52e15969565caa7bc064a/config.json 05/13/2024 17:18:46 - INFO - transformers.configuration_utils - Model config MistralConfig { "architectures": [ "MistralForCausalLM" ], "attention_dropout": 0.0, "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 14336, "max_position_embeddings": 32768, "model_type": "mistral", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 8, "rms_norm_eps": 1e-05, "rope_theta": 1000000.0, "sliding_window": null, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.40.2", "use_cache": true, "vocab_size": 32000 } 05/13/2024 17:18:46 - INFO - transformers.tokenization_utils_base - tokenizer config file saved in saves/Mistral-7B-v0.2/lora/train_2024-05-13-15-43-20/checkpoint-500/tokenizer_config.json 05/13/2024 17:18:46 - INFO - transformers.tokenization_utils_base - Special tokens file saved in saves/Mistral-7B-v0.2/lora/train_2024-05-13-15-43-20/checkpoint-500/special_tokens_map.json 05/13/2024 17:19:49 - INFO - llmtuner.extras.callbacks - {'loss': 0.7260, 'learning_rate': 2.1165e-05, 'epoch': 1.65} 05/13/2024 17:20:46 - INFO - llmtuner.extras.callbacks - {'loss': 0.8781, 'learning_rate': 2.0743e-05, 'epoch': 1.67} 05/13/2024 17:21:45 - INFO - llmtuner.extras.callbacks - {'loss': 0.7556, 'learning_rate': 2.0322e-05, 'epoch': 1.68} 05/13/2024 17:22:38 - INFO - llmtuner.extras.callbacks - {'loss': 0.8871, 'learning_rate': 1.9903e-05, 'epoch': 1.70} 05/13/2024 17:23:33 - INFO - llmtuner.extras.callbacks - {'loss': 0.8132, 'learning_rate': 1.9485e-05, 'epoch': 1.71} 05/13/2024 17:24:32 - INFO - llmtuner.extras.callbacks - {'loss': 0.8246, 'learning_rate': 1.9068e-05, 'epoch': 1.73} 05/13/2024 17:25:30 - INFO - llmtuner.extras.callbacks - {'loss': 0.8208, 'learning_rate': 1.8654e-05, 'epoch': 1.75} 05/13/2024 17:26:22 - INFO - llmtuner.extras.callbacks - {'loss': 0.8654, 'learning_rate': 1.8241e-05, 'epoch': 1.76} 05/13/2024 17:27:17 - INFO - llmtuner.extras.callbacks - {'loss': 0.8768, 'learning_rate': 1.7830e-05, 'epoch': 1.78} 05/13/2024 17:28:13 - INFO - llmtuner.extras.callbacks - {'loss': 0.8441, 'learning_rate': 1.7421e-05, 'epoch': 1.80} 05/13/2024 17:29:14 - INFO - llmtuner.extras.callbacks - {'loss': 0.7793, 'learning_rate': 1.7015e-05, 'epoch': 1.81} 05/13/2024 17:30:09 - INFO - llmtuner.extras.callbacks - {'loss': 0.9148, 'learning_rate': 1.6610e-05, 'epoch': 1.83} 05/13/2024 17:31:04 - INFO - llmtuner.extras.callbacks - {'loss': 0.8797, 'learning_rate': 1.6209e-05, 'epoch': 1.85} 05/13/2024 17:31:55 - INFO - llmtuner.extras.callbacks - {'loss': 0.8601, 'learning_rate': 1.5810e-05, 'epoch': 1.86} 05/13/2024 17:32:52 - INFO - llmtuner.extras.callbacks - {'loss': 0.8930, 'learning_rate': 1.5413e-05, 'epoch': 1.88} 05/13/2024 17:33:54 - INFO - llmtuner.extras.callbacks - {'loss': 0.8200, 'learning_rate': 1.5020e-05, 'epoch': 1.89} 05/13/2024 17:34:55 - INFO - llmtuner.extras.callbacks - {'loss': 0.7547, 'learning_rate': 1.4629e-05, 'epoch': 1.91} 05/13/2024 17:35:50 - INFO - llmtuner.extras.callbacks - {'loss': 0.8500, 'learning_rate': 1.4241e-05, 'epoch': 1.93} 05/13/2024 17:36:46 - INFO - llmtuner.extras.callbacks - {'loss': 0.8089, 'learning_rate': 1.3857e-05, 'epoch': 1.94} 05/13/2024 17:37:43 - INFO - llmtuner.extras.callbacks - {'loss': 0.8238, 'learning_rate': 1.3475e-05, 'epoch': 1.96} 05/13/2024 17:37:43 - INFO - transformers.trainer - Saving model checkpoint to saves/Mistral-7B-v0.2/lora/train_2024-05-13-15-43-20/checkpoint-600 05/13/2024 17:37:44 - INFO - transformers.configuration_utils - loading configuration file config.json from cache at /home/featurize/.cache/huggingface/hub/models--alpindale--Mistral-7B-v0.2-hf/snapshots/2c3e624962b1a3f3fbf52e15969565caa7bc064a/config.json 05/13/2024 17:37:44 - INFO - transformers.configuration_utils - Model config MistralConfig { "architectures": [ "MistralForCausalLM" ], "attention_dropout": 0.0, "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 14336, "max_position_embeddings": 32768, "model_type": "mistral", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 8, "rms_norm_eps": 1e-05, "rope_theta": 1000000.0, "sliding_window": null, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.40.2", "use_cache": true, "vocab_size": 32000 } 05/13/2024 17:37:44 - INFO - transformers.tokenization_utils_base - tokenizer config file saved in saves/Mistral-7B-v0.2/lora/train_2024-05-13-15-43-20/checkpoint-600/tokenizer_config.json 05/13/2024 17:37:44 - INFO - transformers.tokenization_utils_base - Special tokens file saved in saves/Mistral-7B-v0.2/lora/train_2024-05-13-15-43-20/checkpoint-600/special_tokens_map.json 05/13/2024 17:38:39 - INFO - llmtuner.extras.callbacks - {'loss': 0.8118, 'learning_rate': 1.3097e-05, 'epoch': 1.98} 05/13/2024 17:39:36 - INFO - llmtuner.extras.callbacks - {'loss': 0.7964, 'learning_rate': 1.2723e-05, 'epoch': 1.99} 05/13/2024 17:40:33 - INFO - llmtuner.extras.callbacks - {'loss': 0.8116, 'learning_rate': 1.2352e-05, 'epoch': 2.01} 05/13/2024 17:41:30 - INFO - llmtuner.extras.callbacks - {'loss': 0.7903, 'learning_rate': 1.1985e-05, 'epoch': 2.03} 05/13/2024 17:42:26 - INFO - llmtuner.extras.callbacks - {'loss': 0.8075, 'learning_rate': 1.1622e-05, 'epoch': 2.04} 05/13/2024 17:43:22 - INFO - llmtuner.extras.callbacks - {'loss': 0.7263, 'learning_rate': 1.1262e-05, 'epoch': 2.06} 05/13/2024 17:44:20 - INFO - llmtuner.extras.callbacks - {'loss': 0.8005, 'learning_rate': 1.0907e-05, 'epoch': 2.07} 05/13/2024 17:45:18 - INFO - llmtuner.extras.callbacks - {'loss': 0.8201, 'learning_rate': 1.0556e-05, 'epoch': 2.09} 05/13/2024 17:46:17 - INFO - llmtuner.extras.callbacks - {'loss': 0.7501, 'learning_rate': 1.0209e-05, 'epoch': 2.11} 05/13/2024 17:47:10 - INFO - llmtuner.extras.callbacks - {'loss': 0.8400, 'learning_rate': 9.8659e-06, 'epoch': 2.12} 05/13/2024 17:48:04 - INFO - llmtuner.extras.callbacks - {'loss': 0.7623, 'learning_rate': 9.5277e-06, 'epoch': 2.14} 05/13/2024 17:49:01 - INFO - llmtuner.extras.callbacks - {'loss': 0.7728, 'learning_rate': 9.1940e-06, 'epoch': 2.16} 05/13/2024 17:49:54 - INFO - llmtuner.extras.callbacks - {'loss': 0.8346, 'learning_rate': 8.8649e-06, 'epoch': 2.17} 05/13/2024 17:50:48 - INFO - llmtuner.extras.callbacks - {'loss': 0.8666, 'learning_rate': 8.5405e-06, 'epoch': 2.19} 05/13/2024 17:51:44 - INFO - llmtuner.extras.callbacks - {'loss': 0.8470, 'learning_rate': 8.2209e-06, 'epoch': 2.20} 05/13/2024 17:52:37 - INFO - llmtuner.extras.callbacks - {'loss': 0.8927, 'learning_rate': 7.9063e-06, 'epoch': 2.22} 05/13/2024 17:53:32 - INFO - llmtuner.extras.callbacks - {'loss': 0.7819, 'learning_rate': 7.5967e-06, 'epoch': 2.24} 05/13/2024 17:54:28 - INFO - llmtuner.extras.callbacks - {'loss': 0.8700, 'learning_rate': 7.2921e-06, 'epoch': 2.25} 05/13/2024 17:55:26 - INFO - llmtuner.extras.callbacks - {'loss': 0.8865, 'learning_rate': 6.9927e-06, 'epoch': 2.27} 05/13/2024 17:56:23 - INFO - llmtuner.extras.callbacks - {'loss': 0.8165, 'learning_rate': 6.6987e-06, 'epoch': 2.29} 05/13/2024 17:56:23 - INFO - transformers.trainer - Saving model checkpoint to saves/Mistral-7B-v0.2/lora/train_2024-05-13-15-43-20/checkpoint-700 05/13/2024 17:56:24 - INFO - transformers.configuration_utils - loading configuration file config.json from cache at /home/featurize/.cache/huggingface/hub/models--alpindale--Mistral-7B-v0.2-hf/snapshots/2c3e624962b1a3f3fbf52e15969565caa7bc064a/config.json 05/13/2024 17:56:24 - INFO - transformers.configuration_utils - Model config MistralConfig { "architectures": [ "MistralForCausalLM" ], "attention_dropout": 0.0, "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 14336, "max_position_embeddings": 32768, "model_type": "mistral", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 8, "rms_norm_eps": 1e-05, "rope_theta": 1000000.0, "sliding_window": null, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.40.2", "use_cache": true, "vocab_size": 32000 } 05/13/2024 17:56:24 - INFO - transformers.tokenization_utils_base - tokenizer config file saved in saves/Mistral-7B-v0.2/lora/train_2024-05-13-15-43-20/checkpoint-700/tokenizer_config.json 05/13/2024 17:56:24 - INFO - transformers.tokenization_utils_base - Special tokens file saved in saves/Mistral-7B-v0.2/lora/train_2024-05-13-15-43-20/checkpoint-700/special_tokens_map.json 05/13/2024 17:57:17 - INFO - llmtuner.extras.callbacks - {'loss': 0.8516, 'learning_rate': 6.4099e-06, 'epoch': 2.30} 05/13/2024 17:58:13 - INFO - llmtuner.extras.callbacks - {'loss': 0.8405, 'learning_rate': 6.1266e-06, 'epoch': 2.32} 05/13/2024 17:59:10 - INFO - llmtuner.extras.callbacks - {'loss': 0.7471, 'learning_rate': 5.8489e-06, 'epoch': 2.34} 05/13/2024 18:00:07 - INFO - llmtuner.extras.callbacks - {'loss': 0.7705, 'learning_rate': 5.5767e-06, 'epoch': 2.35} 05/13/2024 18:01:06 - INFO - llmtuner.extras.callbacks - {'loss': 0.7544, 'learning_rate': 5.3103e-06, 'epoch': 2.37} 05/13/2024 18:02:04 - INFO - llmtuner.extras.callbacks - {'loss': 0.7918, 'learning_rate': 5.0496e-06, 'epoch': 2.38} 05/13/2024 18:03:03 - INFO - llmtuner.extras.callbacks - {'loss': 0.7330, 'learning_rate': 4.7947e-06, 'epoch': 2.40} 05/13/2024 18:04:02 - INFO - llmtuner.extras.callbacks - {'loss': 0.7565, 'learning_rate': 4.5458e-06, 'epoch': 2.42} 05/13/2024 18:05:05 - INFO - llmtuner.extras.callbacks - {'loss': 0.7352, 'learning_rate': 4.3028e-06, 'epoch': 2.43} 05/13/2024 18:06:01 - INFO - llmtuner.extras.callbacks - {'loss': 0.7959, 'learning_rate': 4.0659e-06, 'epoch': 2.45} 05/13/2024 18:07:01 - INFO - llmtuner.extras.callbacks - {'loss': 0.7456, 'learning_rate': 3.8351e-06, 'epoch': 2.47} 05/13/2024 18:07:55 - INFO - llmtuner.extras.callbacks - {'loss': 0.8174, 'learning_rate': 3.6106e-06, 'epoch': 2.48} 05/13/2024 18:08:56 - INFO - llmtuner.extras.callbacks - {'loss': 0.7971, 'learning_rate': 3.3923e-06, 'epoch': 2.50} 05/13/2024 18:09:50 - INFO - llmtuner.extras.callbacks - {'loss': 0.8101, 'learning_rate': 3.1803e-06, 'epoch': 2.52} 05/13/2024 18:10:46 - INFO - llmtuner.extras.callbacks - {'loss': 0.8410, 'learning_rate': 2.9747e-06, 'epoch': 2.53} 05/13/2024 18:11:37 - INFO - llmtuner.extras.callbacks - {'loss': 0.7600, 'learning_rate': 2.7756e-06, 'epoch': 2.55} 05/13/2024 18:12:36 - INFO - llmtuner.extras.callbacks - {'loss': 0.7527, 'learning_rate': 2.5829e-06, 'epoch': 2.56} 05/13/2024 18:13:36 - INFO - llmtuner.extras.callbacks - {'loss': 0.7840, 'learning_rate': 2.3968e-06, 'epoch': 2.58} 05/13/2024 18:14:33 - INFO - llmtuner.extras.callbacks - {'loss': 0.8014, 'learning_rate': 2.2174e-06, 'epoch': 2.60} 05/13/2024 18:15:27 - INFO - llmtuner.extras.callbacks - {'loss': 0.8016, 'learning_rate': 2.0446e-06, 'epoch': 2.61} 05/13/2024 18:15:27 - INFO - transformers.trainer - Saving model checkpoint to saves/Mistral-7B-v0.2/lora/train_2024-05-13-15-43-20/checkpoint-800 05/13/2024 18:15:29 - INFO - transformers.configuration_utils - loading configuration file config.json from cache at /home/featurize/.cache/huggingface/hub/models--alpindale--Mistral-7B-v0.2-hf/snapshots/2c3e624962b1a3f3fbf52e15969565caa7bc064a/config.json 05/13/2024 18:15:29 - INFO - transformers.configuration_utils - Model config MistralConfig { "architectures": [ "MistralForCausalLM" ], "attention_dropout": 0.0, "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 14336, "max_position_embeddings": 32768, "model_type": "mistral", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 8, "rms_norm_eps": 1e-05, "rope_theta": 1000000.0, "sliding_window": null, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.40.2", "use_cache": true, "vocab_size": 32000 } 05/13/2024 18:15:29 - INFO - transformers.tokenization_utils_base - tokenizer config file saved in saves/Mistral-7B-v0.2/lora/train_2024-05-13-15-43-20/checkpoint-800/tokenizer_config.json 05/13/2024 18:15:29 - INFO - transformers.tokenization_utils_base - Special tokens file saved in saves/Mistral-7B-v0.2/lora/train_2024-05-13-15-43-20/checkpoint-800/special_tokens_map.json 05/13/2024 18:16:23 - INFO - llmtuner.extras.callbacks - {'loss': 0.8462, 'learning_rate': 1.8785e-06, 'epoch': 2.63} 05/13/2024 18:17:19 - INFO - llmtuner.extras.callbacks - {'loss': 0.8186, 'learning_rate': 1.7192e-06, 'epoch': 2.65} 05/13/2024 18:18:14 - INFO - llmtuner.extras.callbacks - {'loss': 0.8147, 'learning_rate': 1.5668e-06, 'epoch': 2.66} 05/13/2024 18:19:13 - INFO - llmtuner.extras.callbacks - {'loss': 0.7950, 'learning_rate': 1.4211e-06, 'epoch': 2.68} 05/13/2024 18:20:08 - INFO - llmtuner.extras.callbacks - {'loss': 0.8813, 'learning_rate': 1.2824e-06, 'epoch': 2.69} 05/13/2024 18:21:05 - INFO - llmtuner.extras.callbacks - {'loss': 0.8220, 'learning_rate': 1.1507e-06, 'epoch': 2.71} 05/13/2024 18:21:59 - INFO - llmtuner.extras.callbacks - {'loss': 0.8649, 'learning_rate': 1.0259e-06, 'epoch': 2.73} 05/13/2024 18:22:53 - INFO - llmtuner.extras.callbacks - {'loss': 0.8135, 'learning_rate': 9.0810e-07, 'epoch': 2.74} 05/13/2024 18:23:50 - INFO - llmtuner.extras.callbacks - {'loss': 0.8571, 'learning_rate': 7.9738e-07, 'epoch': 2.76} 05/13/2024 18:24:39 - INFO - llmtuner.extras.callbacks - {'loss': 0.8362, 'learning_rate': 6.9375e-07, 'epoch': 2.78} 05/13/2024 18:25:36 - INFO - llmtuner.extras.callbacks - {'loss': 0.9153, 'learning_rate': 5.9724e-07, 'epoch': 2.79} 05/13/2024 18:26:34 - INFO - llmtuner.extras.callbacks - {'loss': 0.8202, 'learning_rate': 5.0787e-07, 'epoch': 2.81} 05/13/2024 18:27:29 - INFO - llmtuner.extras.callbacks - {'loss': 0.8176, 'learning_rate': 4.2567e-07, 'epoch': 2.83} 05/13/2024 18:28:27 - INFO - llmtuner.extras.callbacks - {'loss': 0.8016, 'learning_rate': 3.5067e-07, 'epoch': 2.84} 05/13/2024 18:29:25 - INFO - llmtuner.extras.callbacks - {'loss': 0.7570, 'learning_rate': 2.8288e-07, 'epoch': 2.86} 05/13/2024 18:30:23 - INFO - llmtuner.extras.callbacks - {'loss': 0.8640, 'learning_rate': 2.2234e-07, 'epoch': 2.87} 05/13/2024 18:31:22 - INFO - llmtuner.extras.callbacks - {'loss': 0.7959, 'learning_rate': 1.6904e-07, 'epoch': 2.89} 05/13/2024 18:32:19 - INFO - llmtuner.extras.callbacks - {'loss': 0.8220, 'learning_rate': 1.2302e-07, 'epoch': 2.91} 05/13/2024 18:33:15 - INFO - llmtuner.extras.callbacks - {'loss': 0.8096, 'learning_rate': 8.4276e-08, 'epoch': 2.92} 05/13/2024 18:34:12 - INFO - llmtuner.extras.callbacks - {'loss': 0.8249, 'learning_rate': 5.2830e-08, 'epoch': 2.94} 05/13/2024 18:34:12 - INFO - transformers.trainer - Saving model checkpoint to saves/Mistral-7B-v0.2/lora/train_2024-05-13-15-43-20/checkpoint-900 05/13/2024 18:34:14 - INFO - transformers.configuration_utils - loading configuration file config.json from cache at /home/featurize/.cache/huggingface/hub/models--alpindale--Mistral-7B-v0.2-hf/snapshots/2c3e624962b1a3f3fbf52e15969565caa7bc064a/config.json 05/13/2024 18:34:14 - INFO - transformers.configuration_utils - Model config MistralConfig { "architectures": [ "MistralForCausalLM" ], "attention_dropout": 0.0, "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 14336, "max_position_embeddings": 32768, "model_type": "mistral", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 8, "rms_norm_eps": 1e-05, "rope_theta": 1000000.0, "sliding_window": null, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.40.2", "use_cache": true, "vocab_size": 32000 } 05/13/2024 18:34:14 - INFO - transformers.tokenization_utils_base - tokenizer config file saved in saves/Mistral-7B-v0.2/lora/train_2024-05-13-15-43-20/checkpoint-900/tokenizer_config.json 05/13/2024 18:34:14 - INFO - transformers.tokenization_utils_base - Special tokens file saved in saves/Mistral-7B-v0.2/lora/train_2024-05-13-15-43-20/checkpoint-900/special_tokens_map.json 05/13/2024 18:35:06 - INFO - llmtuner.extras.callbacks - {'loss': 0.8314, 'learning_rate': 2.8688e-08, 'epoch': 2.96} 05/13/2024 18:36:06 - INFO - llmtuner.extras.callbacks - {'loss': 0.7487, 'learning_rate': 1.1857e-08, 'epoch': 2.97} 05/13/2024 18:37:01 - INFO - llmtuner.extras.callbacks - {'loss': 0.8762, 'learning_rate': 2.3423e-09, 'epoch': 2.99} 05/13/2024 18:37:37 - INFO - transformers.trainer - Training completed. Do not forget to share your model on huggingface.co/models =) 05/13/2024 18:37:37 - INFO - transformers.trainer - Saving model checkpoint to saves/Mistral-7B-v0.2/lora/train_2024-05-13-15-43-20 05/13/2024 18:37:38 - INFO - transformers.configuration_utils - loading configuration file config.json from cache at /home/featurize/.cache/huggingface/hub/models--alpindale--Mistral-7B-v0.2-hf/snapshots/2c3e624962b1a3f3fbf52e15969565caa7bc064a/config.json 05/13/2024 18:37:38 - INFO - transformers.configuration_utils - Model config MistralConfig { "architectures": [ "MistralForCausalLM" ], "attention_dropout": 0.0, "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 14336, "max_position_embeddings": 32768, "model_type": "mistral", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 8, "rms_norm_eps": 1e-05, "rope_theta": 1000000.0, "sliding_window": null, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.40.2", "use_cache": true, "vocab_size": 32000 } 05/13/2024 18:37:38 - INFO - transformers.tokenization_utils_base - tokenizer config file saved in saves/Mistral-7B-v0.2/lora/train_2024-05-13-15-43-20/tokenizer_config.json 05/13/2024 18:37:38 - INFO - transformers.tokenization_utils_base - Special tokens file saved in saves/Mistral-7B-v0.2/lora/train_2024-05-13-15-43-20/special_tokens_map.json 05/13/2024 18:37:38 - INFO - transformers.modelcard - Dropping the following result as it does not have all the necessary fields: {'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}