Upload folder using huggingface_hub

99bc6ee verified 4 months ago

4.63 kB

	---
	library_name: transformers
	license: other
	base_model: hon9kon9ize/CantoneseLLM-v1.0
	tags:
	- llama-factory
	- full
	- generated_from_trainer
	model-index:
	- name: Qwen2.5-7B-sft
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# Qwen2.5-7B-sft

	This model is a fine-tuned version of [hon9kon9ize/CantoneseLLM-v1.0](https://huggingface.co/hon9kon9ize/CantoneseLLM-v1.0) on the sft_v1 dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.9464

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 1e-05
	- train_batch_size: 4
	- eval_batch_size: 4
	- seed: 42
	- gradient_accumulation_steps: 8
	- total_train_batch_size: 32
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_ratio: 0.3
	- num_epochs: 3.0

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|
	\| 1.3332 \| 0.0480 \| 100 \| 1.3140 \|
	\| 1.2185 \| 0.0960 \| 200 \| 1.2879 \|
	\| 1.1976 \| 0.1439 \| 300 \| 1.2533 \|
	\| 1.1627 \| 0.1919 \| 400 \| 1.2169 \|
	\| 1.178 \| 0.2399 \| 500 \| 1.1766 \|
	\| 1.133 \| 0.2879 \| 600 \| 1.1296 \|
	\| 1.0466 \| 0.3359 \| 700 \| 1.0983 \|
	\| 1.0657 \| 0.3839 \| 800 \| 1.0770 \|
	\| 1.054 \| 0.4318 \| 900 \| 1.0617 \|
	\| 1.0744 \| 0.4798 \| 1000 \| 1.0487 \|
	\| 0.9977 \| 0.5278 \| 1100 \| 1.0383 \|
	\| 0.9778 \| 0.5758 \| 1200 \| 1.0290 \|
	\| 1.0187 \| 0.6238 \| 1300 \| 1.0211 \|
	\| 1.085 \| 0.6717 \| 1400 \| 1.0131 \|
	\| 0.958 \| 0.7197 \| 1500 \| 1.0072 \|
	\| 1.0482 \| 0.7677 \| 1600 \| 1.0007 \|
	\| 0.9447 \| 0.8157 \| 1700 \| 0.9946 \|
	\| 1.0 \| 0.8637 \| 1800 \| 0.9894 \|
	\| 0.9685 \| 0.9117 \| 1900 \| 0.9849 \|
	\| 0.8576 \| 0.9596 \| 2000 \| 0.9807 \|
	\| 0.8853 \| 1.0076 \| 2100 \| 0.9775 \|
	\| 0.947 \| 1.0556 \| 2200 \| 0.9739 \|
	\| 0.9207 \| 1.1036 \| 2300 \| 0.9713 \|
	\| 0.8596 \| 1.1516 \| 2400 \| 0.9691 \|
	\| 1.0277 \| 1.1995 \| 2500 \| 0.9655 \|
	\| 0.9646 \| 1.2475 \| 2600 \| 0.9631 \|
	\| 0.8583 \| 1.2955 \| 2700 \| 0.9613 \|
	\| 0.9367 \| 1.3435 \| 2800 \| 0.9589 \|
	\| 0.9146 \| 1.3915 \| 2900 \| 0.9570 \|
	\| 0.9697 \| 1.4395 \| 3000 \| 0.9556 \|
	\| 0.8713 \| 1.4874 \| 3100 \| 0.9542 \|
	\| 0.9855 \| 1.5354 \| 3200 \| 0.9524 \|
	\| 0.8651 \| 1.5834 \| 3300 \| 0.9511 \|
	\| 0.9448 \| 1.6314 \| 3400 \| 0.9495 \|
	\| 0.8997 \| 1.6794 \| 3500 \| 0.9485 \|
	\| 1.0446 \| 1.7273 \| 3600 \| 0.9475 \|
	\| 0.8862 \| 1.7753 \| 3700 \| 0.9465 \|
	\| 0.873 \| 1.8233 \| 3800 \| 0.9456 \|
	\| 0.9893 \| 1.8713 \| 3900 \| 0.9448 \|
	\| 0.8915 \| 1.9193 \| 4000 \| 0.9442 \|
	\| 0.8854 \| 1.9673 \| 4100 \| 0.9435 \|
	\| 0.7608 \| 2.0152 \| 4200 \| 0.9447 \|
	\| 0.796 \| 2.0632 \| 4300 \| 0.9464 \|
	\| 0.9225 \| 2.1112 \| 4400 \| 0.9467 \|
	\| 0.9901 \| 2.1592 \| 4500 \| 0.9467 \|
	\| 0.9263 \| 2.2072 \| 4600 \| 0.9468 \|
	\| 0.7735 \| 2.2551 \| 4700 \| 0.9467 \|
	\| 0.8454 \| 2.3031 \| 4800 \| 0.9464 \|
	\| 0.8562 \| 2.3511 \| 4900 \| 0.9466 \|
	\| 0.8923 \| 2.3991 \| 5000 \| 0.9464 \|
	\| 0.7529 \| 2.4471 \| 5100 \| 0.9463 \|
	\| 0.8421 \| 2.4951 \| 5200 \| 0.9463 \|
	\| 0.8578 \| 2.5430 \| 5300 \| 0.9463 \|
	\| 0.8143 \| 2.5910 \| 5400 \| 0.9464 \|
	\| 0.8117 \| 2.6390 \| 5500 \| 0.9463 \|
	\| 0.861 \| 2.6870 \| 5600 \| 0.9464 \|
	\| 0.8415 \| 2.7350 \| 5700 \| 0.9463 \|
	\| 0.7846 \| 2.7829 \| 5800 \| 0.9463 \|
	\| 0.7605 \| 2.8309 \| 5900 \| 0.9464 \|
	\| 0.8721 \| 2.8789 \| 6000 \| 0.9464 \|
	\| 0.8566 \| 2.9269 \| 6100 \| 0.9464 \|
	\| 0.7978 \| 2.9749 \| 6200 \| 0.9464 \|


	### Framework versions

	- Transformers 4.45.0
	- Pytorch 2.4.1+cu121
	- Datasets 2.20.0
	- Tokenizers 0.20.0

	---
	library_name: transformers
	license: other
	base_model: hon9kon9ize/CantoneseLLM-v1.0
	tags:
	- llama-factory
	- full
	- generated_from_trainer
	model-index:
	- name: Qwen2.5-7B-sft
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# Qwen2.5-7B-sft

	This model is a fine-tuned version of [hon9kon9ize/CantoneseLLM-v1.0](https://huggingface.co/hon9kon9ize/CantoneseLLM-v1.0) on the sft_v1 dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.9464

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 1e-05
	- train_batch_size: 4
	- eval_batch_size: 4
	- seed: 42
	- gradient_accumulation_steps: 8
	- total_train_batch_size: 32
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_ratio: 0.3
	- num_epochs: 3.0

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|
	\| 1.3332 \| 0.0480 \| 100 \| 1.3140 \|
	\| 1.2185 \| 0.0960 \| 200 \| 1.2879 \|
	\| 1.1976 \| 0.1439 \| 300 \| 1.2533 \|
	\| 1.1627 \| 0.1919 \| 400 \| 1.2169 \|
	\| 1.178 \| 0.2399 \| 500 \| 1.1766 \|
	\| 1.133 \| 0.2879 \| 600 \| 1.1296 \|
	\| 1.0466 \| 0.3359 \| 700 \| 1.0983 \|
	\| 1.0657 \| 0.3839 \| 800 \| 1.0770 \|
	\| 1.054 \| 0.4318 \| 900 \| 1.0617 \|
	\| 1.0744 \| 0.4798 \| 1000 \| 1.0487 \|
	\| 0.9977 \| 0.5278 \| 1100 \| 1.0383 \|
	\| 0.9778 \| 0.5758 \| 1200 \| 1.0290 \|
	\| 1.0187 \| 0.6238 \| 1300 \| 1.0211 \|
	\| 1.085 \| 0.6717 \| 1400 \| 1.0131 \|
	\| 0.958 \| 0.7197 \| 1500 \| 1.0072 \|
	\| 1.0482 \| 0.7677 \| 1600 \| 1.0007 \|
	\| 0.9447 \| 0.8157 \| 1700 \| 0.9946 \|
	\| 1.0 \| 0.8637 \| 1800 \| 0.9894 \|
	\| 0.9685 \| 0.9117 \| 1900 \| 0.9849 \|
	\| 0.8576 \| 0.9596 \| 2000 \| 0.9807 \|
	\| 0.8853 \| 1.0076 \| 2100 \| 0.9775 \|
	\| 0.947 \| 1.0556 \| 2200 \| 0.9739 \|
	\| 0.9207 \| 1.1036 \| 2300 \| 0.9713 \|
	\| 0.8596 \| 1.1516 \| 2400 \| 0.9691 \|
	\| 1.0277 \| 1.1995 \| 2500 \| 0.9655 \|
	\| 0.9646 \| 1.2475 \| 2600 \| 0.9631 \|
	\| 0.8583 \| 1.2955 \| 2700 \| 0.9613 \|
	\| 0.9367 \| 1.3435 \| 2800 \| 0.9589 \|
	\| 0.9146 \| 1.3915 \| 2900 \| 0.9570 \|
	\| 0.9697 \| 1.4395 \| 3000 \| 0.9556 \|
	\| 0.8713 \| 1.4874 \| 3100 \| 0.9542 \|
	\| 0.9855 \| 1.5354 \| 3200 \| 0.9524 \|
	\| 0.8651 \| 1.5834 \| 3300 \| 0.9511 \|
	\| 0.9448 \| 1.6314 \| 3400 \| 0.9495 \|
	\| 0.8997 \| 1.6794 \| 3500 \| 0.9485 \|
	\| 1.0446 \| 1.7273 \| 3600 \| 0.9475 \|
	\| 0.8862 \| 1.7753 \| 3700 \| 0.9465 \|
	\| 0.873 \| 1.8233 \| 3800 \| 0.9456 \|
	\| 0.9893 \| 1.8713 \| 3900 \| 0.9448 \|
	\| 0.8915 \| 1.9193 \| 4000 \| 0.9442 \|
	\| 0.8854 \| 1.9673 \| 4100 \| 0.9435 \|
	\| 0.7608 \| 2.0152 \| 4200 \| 0.9447 \|
	\| 0.796 \| 2.0632 \| 4300 \| 0.9464 \|
	\| 0.9225 \| 2.1112 \| 4400 \| 0.9467 \|
	\| 0.9901 \| 2.1592 \| 4500 \| 0.9467 \|
	\| 0.9263 \| 2.2072 \| 4600 \| 0.9468 \|
	\| 0.7735 \| 2.2551 \| 4700 \| 0.9467 \|
	\| 0.8454 \| 2.3031 \| 4800 \| 0.9464 \|
	\| 0.8562 \| 2.3511 \| 4900 \| 0.9466 \|
	\| 0.8923 \| 2.3991 \| 5000 \| 0.9464 \|
	\| 0.7529 \| 2.4471 \| 5100 \| 0.9463 \|
	\| 0.8421 \| 2.4951 \| 5200 \| 0.9463 \|
	\| 0.8578 \| 2.5430 \| 5300 \| 0.9463 \|
	\| 0.8143 \| 2.5910 \| 5400 \| 0.9464 \|
	\| 0.8117 \| 2.6390 \| 5500 \| 0.9463 \|
	\| 0.861 \| 2.6870 \| 5600 \| 0.9464 \|
	\| 0.8415 \| 2.7350 \| 5700 \| 0.9463 \|
	\| 0.7846 \| 2.7829 \| 5800 \| 0.9463 \|
	\| 0.7605 \| 2.8309 \| 5900 \| 0.9464 \|
	\| 0.8721 \| 2.8789 \| 6000 \| 0.9464 \|
	\| 0.8566 \| 2.9269 \| 6100 \| 0.9464 \|
	\| 0.7978 \| 2.9749 \| 6200 \| 0.9464 \|


	### Framework versions

	- Transformers 4.45.0
	- Pytorch 2.4.1+cu121
	- Datasets 2.20.0
	- Tokenizers 0.20.0