models / README.md

joedonino/zephyr-7b-radia-html-events-v8

6e9eeeb about 1 year ago

3.42 kB

	---
	license: mit
	base_model: HuggingFaceH4/zephyr-7b-beta
	tags:
	- generated_from_trainer
	model-index:
	- name: models
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# models

	This model is a fine-tuned version of [HuggingFaceH4/zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta) on the None dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.7368

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0004
	- train_batch_size: 8
	- eval_batch_size: 8
	- seed: 42
	- gradient_accumulation_steps: 2
	- total_train_batch_size: 16
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 100
	- training_steps: 40
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|
	\| 0.9828 \| 0.02 \| 1 \| 1.0330 \|
	\| 1.1538 \| 0.03 \| 2 \| 1.0256 \|
	\| 0.9734 \| 0.05 \| 3 \| 1.0120 \|
	\| 1.0574 \| 0.07 \| 4 \| 0.9942 \|
	\| 0.9797 \| 0.09 \| 5 \| 0.9755 \|
	\| 0.9399 \| 0.1 \| 6 \| 0.9580 \|
	\| 1.0294 \| 0.12 \| 7 \| 0.9434 \|
	\| 0.7965 \| 0.14 \| 8 \| 0.9318 \|
	\| 0.7741 \| 0.16 \| 9 \| 0.9236 \|
	\| 0.8252 \| 0.17 \| 10 \| 0.9178 \|
	\| 0.8478 \| 0.19 \| 11 \| 0.9135 \|
	\| 0.9641 \| 0.21 \| 12 \| 0.9068 \|
	\| 0.9073 \| 0.22 \| 13 \| 0.8980 \|
	\| 0.9682 \| 0.24 \| 14 \| 0.8877 \|
	\| 0.8794 \| 0.26 \| 15 \| 0.8774 \|
	\| 0.7602 \| 0.28 \| 16 \| 0.8690 \|
	\| 0.9019 \| 0.29 \| 17 \| 0.8611 \|
	\| 0.8619 \| 0.31 \| 18 \| 0.8547 \|
	\| 0.8195 \| 0.33 \| 19 \| 0.8484 \|
	\| 0.9562 \| 0.34 \| 20 \| 0.8418 \|
	\| 0.7822 \| 0.36 \| 21 \| 0.8366 \|
	\| 0.767 \| 0.38 \| 22 \| 0.8308 \|
	\| 0.9024 \| 0.4 \| 23 \| 0.8242 \|
	\| 0.8596 \| 0.41 \| 24 \| 0.8183 \|
	\| 0.8424 \| 0.43 \| 25 \| 0.8123 \|
	\| 0.7396 \| 0.45 \| 26 \| 0.8059 \|
	\| 0.7742 \| 0.47 \| 27 \| 0.7999 \|
	\| 0.7007 \| 0.48 \| 28 \| 0.7943 \|
	\| 0.6915 \| 0.5 \| 29 \| 0.7890 \|
	\| 0.7054 \| 0.52 \| 30 \| 0.7836 \|
	\| 0.7622 \| 0.53 \| 31 \| 0.7785 \|
	\| 0.6493 \| 0.55 \| 32 \| 0.7720 \|
	\| 0.6106 \| 0.57 \| 33 \| 0.7650 \|
	\| 0.7534 \| 0.59 \| 34 \| 0.7583 \|
	\| 0.7065 \| 0.6 \| 35 \| 0.7532 \|
	\| 0.8823 \| 0.62 \| 36 \| 0.7472 \|
	\| 0.7082 \| 0.64 \| 37 \| 0.7424 \|
	\| 0.7292 \| 0.66 \| 38 \| 0.7405 \|
	\| 0.8142 \| 0.67 \| 39 \| 0.7390 \|
	\| 0.6079 \| 0.69 \| 40 \| 0.7368 \|


	### Framework versions

	- Transformers 4.36.0.dev0
	- Pytorch 2.1.0+cu118
	- Datasets 2.15.0
	- Tokenizers 0.15.0

	---
	license: mit
	base_model: HuggingFaceH4/zephyr-7b-beta
	tags:
	- generated_from_trainer
	model-index:
	- name: models
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# models

	This model is a fine-tuned version of [HuggingFaceH4/zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta) on the None dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.7368

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0004
	- train_batch_size: 8
	- eval_batch_size: 8
	- seed: 42
	- gradient_accumulation_steps: 2
	- total_train_batch_size: 16
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 100
	- training_steps: 40
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|
	\| 0.9828 \| 0.02 \| 1 \| 1.0330 \|
	\| 1.1538 \| 0.03 \| 2 \| 1.0256 \|
	\| 0.9734 \| 0.05 \| 3 \| 1.0120 \|
	\| 1.0574 \| 0.07 \| 4 \| 0.9942 \|
	\| 0.9797 \| 0.09 \| 5 \| 0.9755 \|
	\| 0.9399 \| 0.1 \| 6 \| 0.9580 \|
	\| 1.0294 \| 0.12 \| 7 \| 0.9434 \|
	\| 0.7965 \| 0.14 \| 8 \| 0.9318 \|
	\| 0.7741 \| 0.16 \| 9 \| 0.9236 \|
	\| 0.8252 \| 0.17 \| 10 \| 0.9178 \|
	\| 0.8478 \| 0.19 \| 11 \| 0.9135 \|
	\| 0.9641 \| 0.21 \| 12 \| 0.9068 \|
	\| 0.9073 \| 0.22 \| 13 \| 0.8980 \|
	\| 0.9682 \| 0.24 \| 14 \| 0.8877 \|
	\| 0.8794 \| 0.26 \| 15 \| 0.8774 \|
	\| 0.7602 \| 0.28 \| 16 \| 0.8690 \|
	\| 0.9019 \| 0.29 \| 17 \| 0.8611 \|
	\| 0.8619 \| 0.31 \| 18 \| 0.8547 \|
	\| 0.8195 \| 0.33 \| 19 \| 0.8484 \|
	\| 0.9562 \| 0.34 \| 20 \| 0.8418 \|
	\| 0.7822 \| 0.36 \| 21 \| 0.8366 \|
	\| 0.767 \| 0.38 \| 22 \| 0.8308 \|
	\| 0.9024 \| 0.4 \| 23 \| 0.8242 \|
	\| 0.8596 \| 0.41 \| 24 \| 0.8183 \|
	\| 0.8424 \| 0.43 \| 25 \| 0.8123 \|
	\| 0.7396 \| 0.45 \| 26 \| 0.8059 \|
	\| 0.7742 \| 0.47 \| 27 \| 0.7999 \|
	\| 0.7007 \| 0.48 \| 28 \| 0.7943 \|
	\| 0.6915 \| 0.5 \| 29 \| 0.7890 \|
	\| 0.7054 \| 0.52 \| 30 \| 0.7836 \|
	\| 0.7622 \| 0.53 \| 31 \| 0.7785 \|
	\| 0.6493 \| 0.55 \| 32 \| 0.7720 \|
	\| 0.6106 \| 0.57 \| 33 \| 0.7650 \|
	\| 0.7534 \| 0.59 \| 34 \| 0.7583 \|
	\| 0.7065 \| 0.6 \| 35 \| 0.7532 \|
	\| 0.8823 \| 0.62 \| 36 \| 0.7472 \|
	\| 0.7082 \| 0.64 \| 37 \| 0.7424 \|
	\| 0.7292 \| 0.66 \| 38 \| 0.7405 \|
	\| 0.8142 \| 0.67 \| 39 \| 0.7390 \|
	\| 0.6079 \| 0.69 \| 40 \| 0.7368 \|


	### Framework versions

	- Transformers 4.36.0.dev0
	- Pytorch 2.1.0+cu118
	- Datasets 2.15.0
	- Tokenizers 0.15.0