End of training

daae4c6 verified 4 months ago

4.07 kB

	---
	base_model: huggyllama/llama-13b
	library_name: peft
	license: other
	tags:
	- generated_from_trainer
	model-index:
	- name: llama-13b_alpaca-clean_l0.0002_64
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# llama-13b_alpaca-clean_l0.0002_64

	This model is a fine-tuned version of [huggyllama/llama-13b](https://huggingface.co/huggyllama/llama-13b) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 1.5226

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0002
	- train_batch_size: 1
	- eval_batch_size: 1
	- seed: 0
	- gradient_accumulation_steps: 16
	- total_train_batch_size: 16
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: constant
	- lr_scheduler_warmup_ratio: 0.03
	- training_steps: 0

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|
	\| 1.1456 \| 0.0003 \| 1 \| 2.3431 \|
	\| 2.0148 \| 0.0590 \| 187 \| 1.5488 \|
	\| 1.2875 \| 0.1179 \| 374 \| 1.5405 \|
	\| 1.0668 \| 0.1769 \| 561 \| 1.5543 \|
	\| 1.8093 \| 0.2359 \| 748 \| 1.4975 \|
	\| 1.6438 \| 0.2949 \| 935 \| 1.4835 \|
	\| 1.2251 \| 0.3538 \| 1122 \| 1.4810 \|
	\| 1.0625 \| 0.4128 \| 1309 \| 1.4741 \|
	\| 1.8002 \| 0.4718 \| 1496 \| 1.4496 \|
	\| 1.5269 \| 0.5307 \| 1683 \| 1.4526 \|
	\| 1.1458 \| 0.5897 \| 1870 \| 1.4545 \|
	\| 1.0543 \| 0.6487 \| 2057 \| 1.4612 \|
	\| 2.2424 \| 0.7077 \| 2244 \| 1.4442 \|
	\| 1.593 \| 0.7666 \| 2431 \| 1.4435 \|
	\| 1.0416 \| 0.8256 \| 2618 \| 1.4539 \|
	\| 0.9933 \| 0.8846 \| 2805 \| 1.4524 \|
	\| 1.9771 \| 0.9436 \| 2992 \| 1.4390 \|
	\| 0.9435 \| 1.0025 \| 3179 \| 1.4399 \|
	\| 2.3091 \| 1.0615 \| 3366 \| 1.4685 \|
	\| 1.3242 \| 1.1205 \| 3553 \| 1.4607 \|
	\| 1.1381 \| 1.1794 \| 3740 \| 1.4711 \|
	\| 0.907 \| 1.2384 \| 3927 \| 1.4860 \|
	\| 1.752 \| 1.2974 \| 4114 \| 1.4583 \|
	\| 1.0621 \| 1.3564 \| 4301 \| 1.4590 \|
	\| 0.9417 \| 1.4153 \| 4488 \| 1.4633 \|
	\| 1.0226 \| 1.4743 \| 4675 \| 1.4648 \|
	\| 1.8375 \| 1.5333 \| 4862 \| 1.4569 \|
	\| 1.3047 \| 1.5922 \| 5049 \| 1.4614 \|
	\| 0.9083 \| 1.6512 \| 5236 \| 1.4736 \|
	\| 0.9209 \| 1.7102 \| 5423 \| 1.4640 \|
	\| 1.6807 \| 1.7692 \| 5610 \| 1.4494 \|
	\| 1.0549 \| 1.8281 \| 5797 \| 1.4558 \|
	\| 0.9171 \| 1.8871 \| 5984 \| 1.4559 \|
	\| 2.0487 \| 1.9461 \| 6171 \| 1.4512 \|
	\| 0.8636 \| 2.0050 \| 6358 \| 1.4486 \|
	\| 0.8722 \| 2.0640 \| 6545 \| 1.5880 \|
	\| 1.2758 \| 2.1230 \| 6732 \| 1.5332 \|
	\| 0.9294 \| 2.1820 \| 6919 \| 1.5220 \|
	\| 0.9638 \| 2.2409 \| 7106 \| 1.5444 \|
	\| 0.9522 \| 2.2999 \| 7293 \| 1.5982 \|
	\| 1.0788 \| 2.3589 \| 7480 \| 1.5257 \|
	\| 1.0903 \| 2.4178 \| 7667 \| 1.5385 \|
	\| 0.9291 \| 2.4768 \| 7854 \| 1.5559 \|
	\| 1.0212 \| 2.5358 \| 8041 \| 1.5356 \|
	\| 1.3065 \| 2.5948 \| 8228 \| 1.5146 \|
	\| 0.9102 \| 2.6537 \| 8415 \| 1.5322 \|
	\| 0.8117 \| 2.7127 \| 8602 \| 1.5404 \|
	\| 1.4213 \| 2.7717 \| 8789 \| 1.5409 \|
	\| 1.1398 \| 2.8307 \| 8976 \| 1.5152 \|
	\| 0.9868 \| 2.8896 \| 9163 \| 1.5408 \|
	\| 0.8449 \| 2.9486 \| 9350 \| 1.5555 \|


	### Framework versions

	- PEFT 0.12.1.dev0
	- Transformers 4.45.0.dev0
	- Pytorch 2.3.0+cu121
	- Datasets 2.19.0
	- Tokenizers 0.19.1

	---
	base_model: huggyllama/llama-13b
	library_name: peft
	license: other
	tags:
	- generated_from_trainer
	model-index:
	- name: llama-13b_alpaca-clean_l0.0002_64
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# llama-13b_alpaca-clean_l0.0002_64

	This model is a fine-tuned version of [huggyllama/llama-13b](https://huggingface.co/huggyllama/llama-13b) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 1.5226

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0002
	- train_batch_size: 1
	- eval_batch_size: 1
	- seed: 0
	- gradient_accumulation_steps: 16
	- total_train_batch_size: 16
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: constant
	- lr_scheduler_warmup_ratio: 0.03
	- training_steps: 0

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|
	\| 1.1456 \| 0.0003 \| 1 \| 2.3431 \|
	\| 2.0148 \| 0.0590 \| 187 \| 1.5488 \|
	\| 1.2875 \| 0.1179 \| 374 \| 1.5405 \|
	\| 1.0668 \| 0.1769 \| 561 \| 1.5543 \|
	\| 1.8093 \| 0.2359 \| 748 \| 1.4975 \|
	\| 1.6438 \| 0.2949 \| 935 \| 1.4835 \|
	\| 1.2251 \| 0.3538 \| 1122 \| 1.4810 \|
	\| 1.0625 \| 0.4128 \| 1309 \| 1.4741 \|
	\| 1.8002 \| 0.4718 \| 1496 \| 1.4496 \|
	\| 1.5269 \| 0.5307 \| 1683 \| 1.4526 \|
	\| 1.1458 \| 0.5897 \| 1870 \| 1.4545 \|
	\| 1.0543 \| 0.6487 \| 2057 \| 1.4612 \|
	\| 2.2424 \| 0.7077 \| 2244 \| 1.4442 \|
	\| 1.593 \| 0.7666 \| 2431 \| 1.4435 \|
	\| 1.0416 \| 0.8256 \| 2618 \| 1.4539 \|
	\| 0.9933 \| 0.8846 \| 2805 \| 1.4524 \|
	\| 1.9771 \| 0.9436 \| 2992 \| 1.4390 \|
	\| 0.9435 \| 1.0025 \| 3179 \| 1.4399 \|
	\| 2.3091 \| 1.0615 \| 3366 \| 1.4685 \|
	\| 1.3242 \| 1.1205 \| 3553 \| 1.4607 \|
	\| 1.1381 \| 1.1794 \| 3740 \| 1.4711 \|
	\| 0.907 \| 1.2384 \| 3927 \| 1.4860 \|
	\| 1.752 \| 1.2974 \| 4114 \| 1.4583 \|
	\| 1.0621 \| 1.3564 \| 4301 \| 1.4590 \|
	\| 0.9417 \| 1.4153 \| 4488 \| 1.4633 \|
	\| 1.0226 \| 1.4743 \| 4675 \| 1.4648 \|
	\| 1.8375 \| 1.5333 \| 4862 \| 1.4569 \|
	\| 1.3047 \| 1.5922 \| 5049 \| 1.4614 \|
	\| 0.9083 \| 1.6512 \| 5236 \| 1.4736 \|
	\| 0.9209 \| 1.7102 \| 5423 \| 1.4640 \|
	\| 1.6807 \| 1.7692 \| 5610 \| 1.4494 \|
	\| 1.0549 \| 1.8281 \| 5797 \| 1.4558 \|
	\| 0.9171 \| 1.8871 \| 5984 \| 1.4559 \|
	\| 2.0487 \| 1.9461 \| 6171 \| 1.4512 \|
	\| 0.8636 \| 2.0050 \| 6358 \| 1.4486 \|
	\| 0.8722 \| 2.0640 \| 6545 \| 1.5880 \|
	\| 1.2758 \| 2.1230 \| 6732 \| 1.5332 \|
	\| 0.9294 \| 2.1820 \| 6919 \| 1.5220 \|
	\| 0.9638 \| 2.2409 \| 7106 \| 1.5444 \|
	\| 0.9522 \| 2.2999 \| 7293 \| 1.5982 \|
	\| 1.0788 \| 2.3589 \| 7480 \| 1.5257 \|
	\| 1.0903 \| 2.4178 \| 7667 \| 1.5385 \|
	\| 0.9291 \| 2.4768 \| 7854 \| 1.5559 \|
	\| 1.0212 \| 2.5358 \| 8041 \| 1.5356 \|
	\| 1.3065 \| 2.5948 \| 8228 \| 1.5146 \|
	\| 0.9102 \| 2.6537 \| 8415 \| 1.5322 \|
	\| 0.8117 \| 2.7127 \| 8602 \| 1.5404 \|
	\| 1.4213 \| 2.7717 \| 8789 \| 1.5409 \|
	\| 1.1398 \| 2.8307 \| 8976 \| 1.5152 \|
	\| 0.9868 \| 2.8896 \| 9163 \| 1.5408 \|
	\| 0.8449 \| 2.9486 \| 9350 \| 1.5555 \|


	### Framework versions

	- PEFT 0.12.1.dev0
	- Transformers 4.45.0.dev0
	- Pytorch 2.3.0+cu121
	- Datasets 2.19.0
	- Tokenizers 0.19.1