07/03/2024 15:04:16 - INFO - transformers.tokenization_utils_base - loading file tokenizer.model from cache at /home/ubuntu/.cache/huggingface/hub/models--llava-hf--llava-v1.6-vicuna-7b-hf/snapshots/382e15404d46a627a15b299a41618e2efb04f2a7/tokenizer.model
07/03/2024 15:04:16 - INFO - transformers.tokenization_utils_base - loading file tokenizer.json from cache at /home/ubuntu/.cache/huggingface/hub/models--llava-hf--llava-v1.6-vicuna-7b-hf/snapshots/382e15404d46a627a15b299a41618e2efb04f2a7/tokenizer.json
07/03/2024 15:04:16 - INFO - transformers.tokenization_utils_base - loading file added_tokens.json from cache at /home/ubuntu/.cache/huggingface/hub/models--llava-hf--llava-v1.6-vicuna-7b-hf/snapshots/382e15404d46a627a15b299a41618e2efb04f2a7/added_tokens.json
07/03/2024 15:04:16 - INFO - transformers.tokenization_utils_base - loading file special_tokens_map.json from cache at /home/ubuntu/.cache/huggingface/hub/models--llava-hf--llava-v1.6-vicuna-7b-hf/snapshots/382e15404d46a627a15b299a41618e2efb04f2a7/special_tokens_map.json
07/03/2024 15:04:16 - INFO - transformers.tokenization_utils_base - loading file tokenizer_config.json from cache at /home/ubuntu/.cache/huggingface/hub/models--llava-hf--llava-v1.6-vicuna-7b-hf/snapshots/382e15404d46a627a15b299a41618e2efb04f2a7/tokenizer_config.json
07/03/2024 15:04:16 - INFO - transformers.image_processing_base - loading configuration file preprocessor_config.json from cache at /home/ubuntu/.cache/huggingface/hub/models--llava-hf--llava-v1.6-vicuna-7b-hf/snapshots/382e15404d46a627a15b299a41618e2efb04f2a7/preprocessor_config.json
07/03/2024 15:04:16 - INFO - transformers.image_processing_base - loading configuration file preprocessor_config.json from cache at /home/ubuntu/.cache/huggingface/hub/models--llava-hf--llava-v1.6-vicuna-7b-hf/snapshots/382e15404d46a627a15b299a41618e2efb04f2a7/preprocessor_config.json
07/03/2024 15:04:16 - INFO - transformers.image_processing_base - Image processor LlavaNextImageProcessor {
"aspect_ratio_setting": "anyres",
"crop_size": {
"height": 336,
"width": 336
},
"do_center_crop": true,
"do_convert_rgb": true,
"do_normalize": true,
"do_pad": true,
"do_rescale": true,
"do_resize": true,
"image_grid_pinpoints": [
[
336,
672
],
[
672,
336
],
[
672,
672
],
[
1008,
336
],
[
336,
1008
]
],
"image_mean": [
0.48145466,
0.4578275,
0.40821073
],
"image_processor_type": "LlavaNextImageProcessor",
"image_std": [
0.26862954,
0.26130258,
0.27577711
],
"processor_class": "LlavaNextProcessor",
"resample": 3,
"rescale_factor": 0.00392156862745098,
"size": {
"shortest_edge": 336
}
}
07/03/2024 15:04:16 - INFO - transformers.tokenization_utils_base - loading file tokenizer.model from cache at /home/ubuntu/.cache/huggingface/hub/models--llava-hf--llava-v1.6-vicuna-7b-hf/snapshots/382e15404d46a627a15b299a41618e2efb04f2a7/tokenizer.model
07/03/2024 15:04:16 - INFO - transformers.tokenization_utils_base - loading file tokenizer.json from cache at /home/ubuntu/.cache/huggingface/hub/models--llava-hf--llava-v1.6-vicuna-7b-hf/snapshots/382e15404d46a627a15b299a41618e2efb04f2a7/tokenizer.json
07/03/2024 15:04:16 - INFO - transformers.tokenization_utils_base - loading file added_tokens.json from cache at /home/ubuntu/.cache/huggingface/hub/models--llava-hf--llava-v1.6-vicuna-7b-hf/snapshots/382e15404d46a627a15b299a41618e2efb04f2a7/added_tokens.json
07/03/2024 15:04:16 - INFO - transformers.tokenization_utils_base - loading file special_tokens_map.json from cache at /home/ubuntu/.cache/huggingface/hub/models--llava-hf--llava-v1.6-vicuna-7b-hf/snapshots/382e15404d46a627a15b299a41618e2efb04f2a7/special_tokens_map.json
07/03/2024 15:04:16 - INFO - transformers.tokenization_utils_base - loading file tokenizer_config.json from cache at /home/ubuntu/.cache/huggingface/hub/models--llava-hf--llava-v1.6-vicuna-7b-hf/snapshots/382e15404d46a627a15b299a41618e2efb04f2a7/tokenizer_config.json
07/03/2024 15:04:16 - INFO - transformers.processing_utils - Processor LlavaNextProcessor:
- image_processor: LlavaNextImageProcessor {
"aspect_ratio_setting": "anyres",
"crop_size": {
"height": 336,
"width": 336
},
"do_center_crop": true,
"do_convert_rgb": true,
"do_normalize": true,
"do_pad": true,
"do_rescale": true,
"do_resize": true,
"image_grid_pinpoints": [
[
336,
672
],
[
672,
336
],
[
672,
672
],
[
1008,
336
],
[
336,
1008
]
],
"image_mean": [
0.48145466,
0.4578275,
0.40821073
],
"image_processor_type": "LlavaNextImageProcessor",
"image_std": [
0.26862954,
0.26130258,
0.27577711
],
"processor_class": "LlavaNextProcessor",
"resample": 3,
"rescale_factor": 0.00392156862745098,
"size": {
"shortest_edge": 336
}
}
- tokenizer: LlamaTokenizerFast(name_or_path='llava-hf/llava-v1.6-vicuna-7b-hf', vocab_size=32000, model_max_length=4096, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'bos_token': '', 'eos_token': '', 'unk_token': '', 'pad_token': ''}, clean_up_tokenization_spaces=False), added_tokens_decoder={
0: AddedToken("", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
1: AddedToken("", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
2: AddedToken("", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
32000: AddedToken("", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
}
{
"chat_template": null,
"processor_class": "LlavaNextProcessor"
}
07/03/2024 15:04:16 - INFO - llamafactory.data.loader - Loading dataset tminh/test-llava4...
07/03/2024 15:04:24 - INFO - transformers.configuration_utils - loading configuration file config.json from cache at /home/ubuntu/.cache/huggingface/hub/models--llava-hf--llava-v1.6-vicuna-7b-hf/snapshots/382e15404d46a627a15b299a41618e2efb04f2a7/config.json
07/03/2024 15:04:24 - INFO - transformers.configuration_utils - Model config LlavaNextConfig {
"_name_or_path": "llava-hf/llava-v1.6-vicuna-7b-hf",
"architectures": [
"LlavaNextForConditionalGeneration"
],
"ignore_index": -100,
"image_grid_pinpoints": [
[
336,
672
],
[
672,
336
],
[
672,
672
],
[
1008,
336
],
[
336,
1008
]
],
"image_token_index": 32000,
"model_type": "llava_next",
"projector_hidden_act": "gelu",
"text_config": {
"_name_or_path": "lmsys/vicuna-7b-v1.5",
"architectures": [
"LlamaForCausalLM"
],
"max_position_embeddings": 4096,
"model_type": "llama",
"pad_token_id": 0,
"rms_norm_eps": 1e-05,
"torch_dtype": "float16",
"vocab_size": 32064
},
"tie_word_embeddings": false,
"torch_dtype": "float16",
"transformers_version": "4.42.3",
"use_image_newline_parameter": true,
"vision_config": {
"hidden_size": 1024,
"image_size": 336,
"intermediate_size": 4096,
"model_type": "clip_vision_model",
"num_attention_heads": 16,
"num_hidden_layers": 24,
"patch_size": 14,
"projection_dim": 768,
"vocab_size": 32000
},
"vision_feature_layer": -2,
"vision_feature_select_strategy": "default",
"vocab_size": 32064
}
07/03/2024 15:04:25 - INFO - transformers.configuration_utils - loading configuration file config.json from cache at /home/ubuntu/.cache/huggingface/hub/models--llava-hf--llava-v1.6-vicuna-7b-hf/snapshots/382e15404d46a627a15b299a41618e2efb04f2a7/config.json
07/03/2024 15:04:25 - INFO - transformers.configuration_utils - Model config LlavaNextConfig {
"_name_or_path": "llava-hf/llava-v1.6-vicuna-7b-hf",
"architectures": [
"LlavaNextForConditionalGeneration"
],
"ignore_index": -100,
"image_grid_pinpoints": [
[
336,
672
],
[
672,
336
],
[
672,
672
],
[
1008,
336
],
[
336,
1008
]
],
"image_token_index": 32000,
"model_type": "llava_next",
"projector_hidden_act": "gelu",
"text_config": {
"_name_or_path": "lmsys/vicuna-7b-v1.5",
"architectures": [
"LlamaForCausalLM"
],
"max_position_embeddings": 4096,
"model_type": "llama",
"pad_token_id": 0,
"rms_norm_eps": 1e-05,
"torch_dtype": "float16",
"vocab_size": 32064
},
"tie_word_embeddings": false,
"torch_dtype": "float16",
"transformers_version": "4.42.3",
"use_image_newline_parameter": true,
"vision_config": {
"hidden_size": 1024,
"image_size": 336,
"intermediate_size": 4096,
"model_type": "clip_vision_model",
"num_attention_heads": 16,
"num_hidden_layers": 24,
"patch_size": 14,
"projection_dim": 768,
"vocab_size": 32000
},
"vision_feature_layer": -2,
"vision_feature_select_strategy": "default",
"vocab_size": 32064
}
07/03/2024 15:04:25 - WARNING - llamafactory.model.model_utils.unsloth - Unsloth does not support model type llava_next.
07/03/2024 15:04:25 - INFO - transformers.modeling_utils - loading weights file model.safetensors from cache at /home/ubuntu/.cache/huggingface/hub/models--llava-hf--llava-v1.6-vicuna-7b-hf/snapshots/382e15404d46a627a15b299a41618e2efb04f2a7/model.safetensors.index.json
07/03/2024 15:05:57 - INFO - transformers.modeling_utils - Instantiating LlavaNextForConditionalGeneration model under default dtype torch.bfloat16.
07/03/2024 15:05:57 - INFO - transformers.generation.configuration_utils - Generate config GenerationConfig {}
07/03/2024 15:05:58 - INFO - transformers.generation.configuration_utils - Generate config GenerationConfig {
"bos_token_id": 1,
"eos_token_id": 2,
"pad_token_id": 0
}
07/03/2024 15:08:02 - INFO - transformers.modeling_utils - All model checkpoint weights were used when initializing LlavaNextForConditionalGeneration.
07/03/2024 15:08:02 - INFO - transformers.modeling_utils - All the weights of LlavaNextForConditionalGeneration were initialized from the model checkpoint at llava-hf/llava-v1.6-vicuna-7b-hf.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlavaNextForConditionalGeneration for predictions without further training.
07/03/2024 15:08:02 - INFO - transformers.generation.configuration_utils - loading configuration file generation_config.json from cache at /home/ubuntu/.cache/huggingface/hub/models--llava-hf--llava-v1.6-vicuna-7b-hf/snapshots/382e15404d46a627a15b299a41618e2efb04f2a7/generation_config.json
07/03/2024 15:08:02 - INFO - transformers.generation.configuration_utils - Generate config GenerationConfig {
"bos_token_id": 1,
"eos_token_id": 2,
"pad_token_id": 0
}
07/03/2024 15:08:02 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.
07/03/2024 15:08:02 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster training and inference.
07/03/2024 15:08:02 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.
07/03/2024 15:08:02 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA
07/03/2024 15:08:02 - INFO - llamafactory.model.model_utils.misc - Found linear modules: v_proj,q_proj,up_proj,linear_2,o_proj,down_proj,k_proj,gate_proj,linear_1
07/03/2024 15:08:02 - INFO - llamafactory.model.loader - trainable params: 5023744 || all params: 7068454912 || trainable%: 0.0711
07/03/2024 15:08:02 - INFO - transformers.trainer - Using auto half precision backend
07/03/2024 15:08:02 - INFO - transformers.trainer - ***** Running training *****
07/03/2024 15:08:02 - INFO - transformers.trainer - Num examples = 100
07/03/2024 15:08:02 - INFO - transformers.trainer - Num Epochs = 1
07/03/2024 15:08:02 - INFO - transformers.trainer - Instantaneous batch size per device = 1
07/03/2024 15:08:02 - INFO - transformers.trainer - Total train batch size (w. parallel, distributed & accumulation) = 8
07/03/2024 15:08:02 - INFO - transformers.trainer - Gradient Accumulation steps = 8
07/03/2024 15:08:02 - INFO - transformers.trainer - Total optimization steps = 12
07/03/2024 15:08:02 - INFO - transformers.trainer - Number of trainable parameters = 5,023,744
07/03/2024 15:08:04 - WARNING - transformers.models.llama.modeling_llama - `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.
07/03/2024 15:10:44 - INFO - llamafactory.extras.callbacks - {'loss': 1.1881, 'learning_rate': 3.1470e-05, 'epoch': 0.40, 'throughput': 113.65}
07/03/2024 15:13:42 - INFO - llamafactory.extras.callbacks - {'loss': 1.1695, 'learning_rate': 3.3494e-06, 'epoch': 0.80, 'throughput': 111.00}
07/03/2024 15:14:56 - INFO - transformers.trainer - Saving model checkpoint to saves/LLaVA1.5-7B-Chat/lora/train_2024-07-03-14-56-50/checkpoint-12
07/03/2024 15:14:56 - INFO - transformers.configuration_utils - loading configuration file config.json from cache at /home/ubuntu/.cache/huggingface/hub/models--llava-hf--llava-v1.6-vicuna-7b-hf/snapshots/382e15404d46a627a15b299a41618e2efb04f2a7/config.json
07/03/2024 15:14:56 - INFO - transformers.configuration_utils - Model config LlavaNextConfig {
"architectures": [
"LlavaNextForConditionalGeneration"
],
"ignore_index": -100,
"image_grid_pinpoints": [
[
336,
672
],
[
672,
336
],
[
672,
672
],
[
1008,
336
],
[
336,
1008
]
],
"image_token_index": 32000,
"model_type": "llava_next",
"projector_hidden_act": "gelu",
"text_config": {
"_name_or_path": "lmsys/vicuna-7b-v1.5",
"architectures": [
"LlamaForCausalLM"
],
"max_position_embeddings": 4096,
"model_type": "llama",
"pad_token_id": 0,
"rms_norm_eps": 1e-05,
"torch_dtype": "float16",
"vocab_size": 32064
},
"tie_word_embeddings": false,
"torch_dtype": "float16",
"transformers_version": "4.42.3",
"use_image_newline_parameter": true,
"vision_config": {
"hidden_size": 1024,
"image_size": 336,
"intermediate_size": 4096,
"model_type": "clip_vision_model",
"num_attention_heads": 16,
"num_hidden_layers": 24,
"patch_size": 14,
"projection_dim": 768,
"vocab_size": 32000
},
"vision_feature_layer": -2,
"vision_feature_select_strategy": "default",
"vocab_size": 32064
}
07/03/2024 15:14:56 - INFO - transformers.image_processing_base - Image processor saved in saves/LLaVA1.5-7B-Chat/lora/train_2024-07-03-14-56-50/checkpoint-12/preprocessor_config.json
07/03/2024 15:14:56 - INFO - transformers.trainer -
Training completed. Do not forget to share your model on huggingface.co/models =)
07/03/2024 15:14:56 - INFO - transformers.trainer - Saving model checkpoint to saves/LLaVA1.5-7B-Chat/lora/train_2024-07-03-14-56-50
07/03/2024 15:14:56 - INFO - transformers.configuration_utils - loading configuration file config.json from cache at /home/ubuntu/.cache/huggingface/hub/models--llava-hf--llava-v1.6-vicuna-7b-hf/snapshots/382e15404d46a627a15b299a41618e2efb04f2a7/config.json
07/03/2024 15:14:56 - INFO - transformers.configuration_utils - Model config LlavaNextConfig {
"architectures": [
"LlavaNextForConditionalGeneration"
],
"ignore_index": -100,
"image_grid_pinpoints": [
[
336,
672
],
[
672,
336
],
[
672,
672
],
[
1008,
336
],
[
336,
1008
]
],
"image_token_index": 32000,
"model_type": "llava_next",
"projector_hidden_act": "gelu",
"text_config": {
"_name_or_path": "lmsys/vicuna-7b-v1.5",
"architectures": [
"LlamaForCausalLM"
],
"max_position_embeddings": 4096,
"model_type": "llama",
"pad_token_id": 0,
"rms_norm_eps": 1e-05,
"torch_dtype": "float16",
"vocab_size": 32064
},
"tie_word_embeddings": false,
"torch_dtype": "float16",
"transformers_version": "4.42.3",
"use_image_newline_parameter": true,
"vision_config": {
"hidden_size": 1024,
"image_size": 336,
"intermediate_size": 4096,
"model_type": "clip_vision_model",
"num_attention_heads": 16,
"num_hidden_layers": 24,
"patch_size": 14,
"projection_dim": 768,
"vocab_size": 32000
},
"vision_feature_layer": -2,
"vision_feature_select_strategy": "default",
"vocab_size": 32064
}
07/03/2024 15:14:56 - INFO - transformers.image_processing_base - Image processor saved in saves/LLaVA1.5-7B-Chat/lora/train_2024-07-03-14-56-50/preprocessor_config.json
07/03/2024 15:14:56 - WARNING - llamafactory.extras.ploting - No metric eval_loss to plot.
07/03/2024 15:14:56 - INFO - transformers.modelcard - Dropping the following result as it does not have all the necessary fields:
{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}