sgraham commited on
Commit
472df12
Β·
1 Parent(s): 37b8930

dec12 fine tuning with reduced learning rate

Browse files
Files changed (2) hide show
  1. 0_CLIPModel/model.safetensors +1 -1
  2. README.txt +58 -64
0_CLIPModel/model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:83bcf2d8c66a985c30c31f1887a1c7fdf0637e4ad4d8888d5ef0d8323c21b126
3
  size 605156676
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2ef1b274a0b0abfd9b447d6b341178801efb904bda0a6a5972c026e95e3796b0
3
  size 605156676
README.txt CHANGED
@@ -1,67 +1,61 @@
1
- python -W ignore finetune-clip-huggingface/huggingface_finetune_clip.py --output_dir /home/ekansa/github/archaeology-images-ai/results --model_name_or_path openai/clip-vit-base-patch32 --train_file /home/ekansa/github/archaeology-images-ai/files/train.json --validation_file /home/ekansa/github/archaeology-images-ai/files/test.json --image_column image --overwrite_output_dir=True --max_seq_length=77 --num_train_epochs=20 --caption_column caption
2
- --overwrite_cache=True --remove_unused_columns=False --do_train --per_device_train_batch_size=64 --per_device_eval_batch_size=64 --learning_rate="5e-5" --warmup_steps="2" --weight_decay 0.2
3
- 11/13/2023 19:25:55 - WARNING - __main__ - Process rank: 0, device: cuda:0, n_gpu: 1distributed training: True, 16-bits training: False
4
- Running tokenizer on train dataset: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 39755/39755 [00:01<00:00, 22415.96 examples/s]
5
- Parameter 'transform'=<function main.<locals>.transform_images at 0x7fded1ed0cc0> of the transform datasets.arrow_dataset.Dataset.set_format couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.
6
- 11/13/2023 19:26:00 - WARNING - datasets.fingerprint - Parameter 'transform'=<function main.<locals>.transform_images at 0x7fded1ed0cc0> of the transform datasets.arrow_dataset.Dataset.set_format couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.
7
- {'loss': 2.1774, 'learning_rate': 4.799807042932948e-05, 'epoch': 0.8}
8
- {'loss': 1.553, 'learning_rate': 4.5988100980865096e-05, 'epoch': 1.61}
9
- {'loss': 1.2706, 'learning_rate': 4.3978131532400705e-05, 'epoch': 2.41}
10
- {'loss': 1.0318, 'learning_rate': 4.196816208393632e-05, 'epoch': 3.22}
11
- {'loss': 0.8534, 'learning_rate': 3.9958192635471945e-05, 'epoch': 4.02}
12
- {'loss': 0.673, 'learning_rate': 3.794822318700756e-05, 'epoch': 4.82}
13
- {'loss': 0.564, 'learning_rate': 3.593825373854318e-05, 'epoch': 5.63}
14
- {'loss': 0.496, 'learning_rate': 3.3928284290078794e-05, 'epoch': 6.43}
15
- {'loss': 0.4287, 'learning_rate': 3.191831484161441e-05, 'epoch': 7.23}
16
- {'loss': 0.3796, 'learning_rate': 2.9908345393150027e-05, 'epoch': 8.04}
17
- {'loss': 0.3378, 'learning_rate': 2.789837594468564e-05, 'epoch': 8.84}
18
- {'loss': 0.3009, 'learning_rate': 2.5888406496221256e-05, 'epoch': 9.65}
19
- {'loss': 0.2707, 'learning_rate': 2.3878437047756876e-05, 'epoch': 10.45}
20
- {'loss': 0.2552, 'learning_rate': 2.1868467599292492e-05, 'epoch': 11.25}
21
- {'loss': 0.2293, 'learning_rate': 1.985849815082811e-05, 'epoch': 12.06}
22
- {'loss': 0.212, 'learning_rate': 1.7848528702363725e-05, 'epoch': 12.86}
23
- {'loss': 0.1879, 'learning_rate': 1.583855925389934e-05, 'epoch': 13.67}
24
- {'loss': 0.1782, 'learning_rate': 1.3828589805434958e-05, 'epoch': 14.47}
25
- {'loss': 0.1726, 'learning_rate': 1.1818620356970576e-05, 'epoch': 15.27}
26
- {'loss': 0.153, 'learning_rate': 9.80865090850619e-06, 'epoch': 16.08}
27
- {'loss': 0.1456, 'learning_rate': 7.798681460041808e-06, 'epoch': 16.88}
28
- {'loss': 0.1397, 'learning_rate': 5.788712011577424e-06, 'epoch': 17.68}
29
- {'loss': 0.1326, 'learning_rate': 3.7787425631130406e-06, 'epoch': 18.49}
30
- {'loss': 0.1228, 'learning_rate': 1.7687731146486576e-06, 'epoch': 19.29}
31
- {'train_runtime': 40077.3502, 'train_samples_per_second': 19.839, 'train_steps_per_second': 0.31, 'train_loss': 0.4972445463444259, 'epoch': 20.0}
32
- 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 12440/12440 [11:07:57<00:00, 3.22s/it]
33
- ***** train metrics *****
34
- epoch = 20.0
35
- train_loss = 0.4972
36
- train_runtime = 11:07:57.35
37
- train_samples_per_second = 19.839
38
- train_steps_per_second = 0.31
39
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
40
 
41
- Then restarted at checkpoint:
42
- 06:34 $ python -W ignore finetune-clip-huggingface/huggingface_finetune_clip.py --output_dir /home/ekansa/github/archaeology-images-ai/results --model_name_or_path openai/clip-vit-base-patch32 --train_file /home/ekansa/github/archaeology-images-ai/files/train.json --validation_file /home/ekansa/github/archaeology-images-ai/files/test.json --image_column image --overwrite_output_dir=False --max_seq_length=77 --num_train_epochs=30 --caption_column caption --overwrite_cache=True --remove_unused_columns=False --do_train --per_device_train_batch_size=64 --per_device_eval_batch_size=64 --learning_rate="5e-5" --warmup_steps="2" --weight_decay 0.2
43
- 11/14/2023 08:43:58 - WARNING - __main__ - Process rank: 0, device: cuda:0, n_gpu: 1distributed training: True, 16-bits training: False
44
- Running tokenizer on train dataset: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 39755/39755 [00:02<00:00, 13727.10 examples/s]
45
- Parameter 'transform'=<function main.<locals>.transform_images at 0x7f2655088cc0> of the transform datasets.arrow_dataset.Dataset.set_format couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.
46
- 11/14/2023 08:44:08 - WARNING - datasets.fingerprint - Parameter 'transform'=<function main.<locals>.transform_images at 0x7f2655088cc0> of the transform datasets.arrow_dataset.Dataset.set_format couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.
47
- {'loss': 0.1433, 'learning_rate': 1.650766427269804e-05, 'epoch': 20.1}
48
- {'loss': 0.1539, 'learning_rate': 1.5167756458355668e-05, 'epoch': 20.9}
49
- {'loss': 0.1532, 'learning_rate': 1.3827848644013291e-05, 'epoch': 21.7}
50
- {'loss': 0.147, 'learning_rate': 1.2487940829670919e-05, 'epoch': 22.51}
51
- {'loss': 0.1423, 'learning_rate': 1.1148033015328546e-05, 'epoch': 23.31}
52
- {'loss': 0.1334, 'learning_rate': 9.808125200986172e-06, 'epoch': 24.12}
53
- {'loss': 0.1329, 'learning_rate': 8.468217386643799e-06, 'epoch': 24.92}
54
- {'loss': 0.1228, 'learning_rate': 7.1283095723014256e-06, 'epoch': 25.72}
55
- {'loss': 0.1234, 'learning_rate': 5.788401757959053e-06, 'epoch': 26.53}
56
- {'loss': 0.1166, 'learning_rate': 4.448493943616679e-06, 'epoch': 27.33}
57
- {'loss': 0.1131, 'learning_rate': 3.108586129274306e-06, 'epoch': 28.14}
58
- {'loss': 0.1118, 'learning_rate': 1.7686783149319325e-06, 'epoch': 28.94}
59
- {'loss': 0.11, 'learning_rate': 4.2877050058955945e-07, 'epoch': 29.74}
60
- {'train_runtime': 31058.1839, 'train_samples_per_second': 38.401, 'train_steps_per_second': 0.601, 'train_loss': 0.046548270668119736, 'epoch': 30.0}
61
- 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 18660/18660 [8:37:38<00:00, 1.66s/it]
62
  ***** train metrics *****
63
- epoch = 30.0
64
- train_loss = 0.0465
65
- train_runtime = 8:37:38.18
66
- train_samples_per_second = 38.401
67
- train_steps_per_second = 0.601
 
1
+ ------------------------------------
2
+ ABOUT THIS MODEL
3
+
4
+ This model is the result of "fine-tuning" the openai/clip-vit-base-patch32 model using captioned images of archaeological artifacts published by Open Context. This model is the latest of several iterations in experiments to improve the captions, debug the training pipeline, and try different fine-tuning parameters. It seems a model with a relatively low training rate helps add some "archaeological knowledge" while still retain much of the general knowledge of out-of-the-box CLIP.
5
+
6
+ We'll use this fine-tuned model in the future to do more experiments, including further fine-tuning with captioned images from open access museum collections of archaeological materials. Below we itemize the specific training parameters used in the fine tuning of this model
7
+
8
+ ------------------------------------
9
+
10
+
11
+
12
+ python -W ignore finetune-clip-huggingface/huggingface_finetune_clip.py --output_dir /home/ekansa/github/archaeology-images-ai/results --model_name_or_path openai/clip-vit-base-patch32 --train_file /home/ekansa/github/archaeology-images-ai/files/train.json --validation_file /home/ekansa/github/archaeology-images-ai/files/test.json --image_column="image_path" --overwrite_output_dir=True --max_seq_length=77 --num_train_epochs=25 --caption_column="caption" --overwrite_cache=True --remove_unused_columns=False --do_train=True --per_device_train_batch_size=64 --per_device_eval_batch_size=64 --learning_rate="2e-5" --warmup_steps="2" --weight_decay 0.2
13
+
14
+ 12/10/2023 21:35:43 - WARNING - __main__ - Process rank: 0, device: cuda:0, n_gpu: 1distributed training: True, 16-bits training: False
15
+ Running tokenizer on train dataset: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 45256/45256 [00:02<00:00, 21481.25 examples/s]Parameter 'transform'=<function main.<locals>.transform_images at 0x7fe53504d9e0> of the transform datasets.arrow_dataset.Dataset.set_format couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.
16
+ 12/10/2023 21:35:47 - WARNING - datasets.fingerprint - Parameter 'transform'=<function main.<locals>.transform_images at 0x7fe53504d9e0> of the transform datasets.arrow_dataset.Dataset.set_format couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.
17
+
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
 
19
+ {'loss': 1.7174, 'learning_rate': 1.9437224545146346e-05, 'epoch': 0.71}
20
+ {'loss': 1.1706, 'learning_rate': 1.887218894790372e-05, 'epoch': 1.41}
21
+ {'loss': 0.9596, 'learning_rate': 1.8307153350661094e-05, 'epoch': 2.12}
22
+ {'loss': 0.7291, 'learning_rate': 1.7742117753418467e-05, 'epoch': 2.82}
23
+ {'loss': 0.5833, 'learning_rate': 1.717708215617584e-05, 'epoch': 3.53}
24
+ {'loss': 0.5094, 'learning_rate': 1.6612046558933215e-05, 'epoch': 4.24}
25
+ {'loss': 0.4368, 'learning_rate': 1.6047010961690588e-05, 'epoch': 4.94}
26
+ {'loss': 0.365, 'learning_rate': 1.548197536444796e-05, 'epoch': 5.65}
27
+ {'loss': 0.3394, 'learning_rate': 1.4916939767205336e-05, 'epoch': 6.36}
28
+ {'loss': 0.3159, 'learning_rate': 1.4351904169962709e-05, 'epoch': 7.06}
29
+ {'loss': 0.2776, 'learning_rate': 1.3786868572720083e-05, 'epoch': 7.77}
30
+ {'loss': 0.2584, 'learning_rate': 1.3221832975477456e-05, 'epoch': 8.47}
31
+ {'loss': 0.2464, 'learning_rate': 1.2656797378234832e-05, 'epoch': 9.18}
32
+ {'loss': 0.227, 'learning_rate': 1.2091761780992204e-05, 'epoch': 9.89}
33
+ {'loss': 0.2116, 'learning_rate': 1.1526726183749577e-05, 'epoch': 10.59}
34
+ {'loss': 0.2026, 'learning_rate': 1.0961690586506951e-05, 'epoch': 11.3}
35
+ {'loss': 0.1869, 'learning_rate': 1.0396654989264325e-05, 'epoch': 12.01}
36
+ {'loss': 0.1792, 'learning_rate': 9.831619392021698e-06, 'epoch': 12.71}
37
+ {'loss': 0.167, 'learning_rate': 9.266583794779072e-06, 'epoch': 13.42}
38
+ {'loss': 0.1671, 'learning_rate': 8.701548197536446e-06, 'epoch': 14.12}
39
+ {'loss': 0.154, 'learning_rate': 8.136512600293819e-06, 'epoch': 14.83}
40
+ {'loss': 0.1574, 'learning_rate': 7.571477003051193e-06, 'epoch': 15.54}
41
+ {'loss': 0.1496, 'learning_rate': 7.006441405808566e-06, 'epoch': 16.24}
42
+ {'loss': 0.1329, 'learning_rate': 5.876370211323313e-06, 'epoch': 17.66}
43
+ {'loss': 0.1316, 'learning_rate': 5.311334614080687e-06, 'epoch': 18.36}
44
+ {'loss': 0.1254, 'learning_rate': 4.746299016838062e-06, 'epoch': 19.07}
45
+ {'loss': 0.1266, 'learning_rate': 4.181263419595435e-06, 'epoch': 19.77}
46
+ {'loss': 0.1193, 'learning_rate': 3.6162278223528084e-06, 'epoch': 20.48}
47
+ {'loss': 0.1163, 'learning_rate': 3.0511922251101822e-06, 'epoch': 21.19}
48
+ {'loss': 0.1154, 'learning_rate': 2.486156627867556e-06, 'epoch': 21.89}
49
+ {'loss': 0.1125, 'learning_rate': 1.9211210306249294e-06, 'epoch': 22.6}
50
+ {'loss': 0.1063, 'learning_rate': 1.356085433382303e-06, 'epoch': 23.31}
51
+ {'loss': 0.1082, 'learning_rate': 7.91049836139677e-07, 'epoch': 24.01}
52
+ {'loss': 0.1032, 'learning_rate': 2.2601423889705053e-07, 'epoch': 24.72}
53
 
54
+ {'train_runtime': 78442.5601, 'train_samples_per_second': 14.423, 'train_steps_per_second': 0.226, 'train_loss': 0.31630637788503185, 'epoch': 25.0}
55
+ 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 17700/17700 [21:47:22<00:00, 4.43s/it]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
56
  ***** train metrics *****
57
+ epoch = 25.0
58
+ train_loss = 0.3163
59
+ train_runtime = 21:47:22.56
60
+ train_samples_per_second = 14.423
61
+ train_steps_per_second = 0.226