lmind_nq_train6000_eval6489_v1_reciteonly_qa_v3_5e-4_lora2
This model is a fine-tuned version of Qwen/Qwen1.5-4B on the tyzhu/lmind_nq_train6000_eval6489_v1_reciteonly_qa_v3 dataset. It achieves the following results on the evaluation set:
- Loss: 3.1517
- Accuracy: 0.5758
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0005
- train_batch_size: 1
- eval_batch_size: 2
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 8
- total_train_batch_size: 32
- total_eval_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 50.0
Training results
Training Loss | Epoch | Step | Validation Loss | Accuracy |
---|---|---|---|---|
1.7461 | 0.9973 | 187 | 1.6695 | 0.6093 |
1.4286 | 2.0 | 375 | 1.6994 | 0.6087 |
1.0309 | 2.9973 | 562 | 1.8122 | 0.6053 |
0.7019 | 4.0 | 750 | 1.9749 | 0.5989 |
0.4634 | 4.9973 | 937 | 2.1953 | 0.5948 |
0.3066 | 6.0 | 1125 | 2.3726 | 0.5917 |
0.2171 | 6.9973 | 1312 | 2.5298 | 0.5900 |
0.1742 | 8.0 | 1500 | 2.5951 | 0.5903 |
0.1376 | 8.9973 | 1687 | 2.6984 | 0.5896 |
0.1325 | 10.0 | 1875 | 2.7171 | 0.5886 |
0.133 | 10.9973 | 2062 | 2.7434 | 0.5879 |
0.1327 | 12.0 | 2250 | 2.7609 | 0.5874 |
0.1387 | 12.9973 | 2437 | 2.7902 | 0.5862 |
0.14 | 14.0 | 2625 | 2.8040 | 0.5855 |
0.1405 | 14.9973 | 2812 | 2.8384 | 0.5847 |
0.1373 | 16.0 | 3000 | 2.8371 | 0.5851 |
0.1192 | 16.9973 | 3187 | 2.8795 | 0.5842 |
0.121 | 18.0 | 3375 | 2.8855 | 0.5849 |
0.1234 | 18.9973 | 3562 | 2.9039 | 0.5839 |
0.1249 | 20.0 | 3750 | 2.9099 | 0.5823 |
0.1254 | 20.9973 | 3937 | 2.9210 | 0.5824 |
0.1263 | 22.0 | 4125 | 2.9261 | 0.5828 |
0.1252 | 22.9973 | 4312 | 2.9145 | 0.5841 |
0.1275 | 24.0 | 4500 | 2.9659 | 0.5830 |
0.1148 | 24.9973 | 4687 | 2.9863 | 0.5819 |
0.1146 | 26.0 | 4875 | 2.9748 | 0.582 |
0.1157 | 26.9973 | 5062 | 2.9689 | 0.5827 |
0.1187 | 28.0 | 5250 | 3.0127 | 0.5816 |
0.1221 | 28.9973 | 5437 | 3.0430 | 0.5826 |
0.1227 | 30.0 | 5625 | 2.9849 | 0.5816 |
0.1242 | 30.9973 | 5812 | 2.9764 | 0.5814 |
0.1244 | 32.0 | 6000 | 3.0284 | 0.5806 |
0.1111 | 32.9973 | 6187 | 3.0857 | 0.5803 |
0.1112 | 34.0 | 6375 | 3.0586 | 0.5799 |
0.1139 | 34.9973 | 6562 | 3.0457 | 0.5803 |
0.1132 | 36.0 | 6750 | 3.0704 | 0.5781 |
0.116 | 36.9973 | 6937 | 3.0578 | 0.5810 |
0.1169 | 38.0 | 7125 | 3.0881 | 0.5814 |
0.1176 | 38.9973 | 7312 | 3.0958 | 0.5787 |
0.1203 | 40.0 | 7500 | 3.1192 | 0.5788 |
0.1105 | 40.9973 | 7687 | 3.0805 | 0.5788 |
0.1135 | 42.0 | 7875 | 3.0892 | 0.5786 |
0.1148 | 42.9973 | 8062 | 3.1191 | 0.5767 |
0.1141 | 44.0 | 8250 | 3.0916 | 0.5770 |
0.1121 | 44.9973 | 8437 | 3.1581 | 0.5762 |
0.1121 | 46.0 | 8625 | 3.1800 | 0.5775 |
0.1147 | 46.9973 | 8812 | 3.1482 | 0.5770 |
0.117 | 48.0 | 9000 | 3.1531 | 0.5780 |
0.1057 | 48.9973 | 9187 | 3.1905 | 0.5781 |
0.1085 | 49.8667 | 9350 | 3.1517 | 0.5758 |
Framework versions
- PEFT 0.5.0
- Transformers 4.41.1
- Pytorch 2.1.0+cu121
- Datasets 2.19.1
- Tokenizers 0.19.1
- Downloads last month
- 0
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
HF Inference API was unable to determine this model’s pipeline type.
Model tree for tyzhu/lmind_nq_train6000_eval6489_v1_reciteonly_qa_v3_5e-4_lora2
Base model
Qwen/Qwen1.5-4BDataset used to train tyzhu/lmind_nq_train6000_eval6489_v1_reciteonly_qa_v3_5e-4_lora2
Evaluation results
- Accuracy on tyzhu/lmind_nq_train6000_eval6489_v1_reciteonly_qa_v3self-reported0.576