cuneytkaya
/

fine-tuned-t5-small-turkish-mmlu

Model card Files Files and versions Community

fine-tuned-t5-small-turkish-mmlu / README.md

cuneytkaya's picture

Update README.md

9797f3d verified 5 months ago

|

history blame contribute delete

2.05 kB

	---
	license: apache-2.0
	datasets:
	- alibayram/turkish_mmlu
	language:
	- tr
	base_model:
	- google-t5/t5-small
	---
	# fine-tuned-t5-small-turkish-mmlu

	<!-- Provide a quick summary of what the model is/does. -->

	The fine-tuned [T5-Small](https://huggingface.co/google-t5/t5-small) model is a question-answering model trained on the [Turkish MMLU](https://huggingface.co/datasets/alibayram/turkish_mmlu) dataset, which consists of questions from various academic and professional exams in Turkey, including KPSS and TUS. The model takes a Turkish question as input and generates the correct answer. It is designed to perform well on Turkish-language question-answering tasks, leveraging the structure of the T5 architecture to handle text-to-text transformations.

	### Training Data

	<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->

	@dataset{bayram_2024_13378019,
	author = {Bayram, M. Ali},
	title = {{Turkish MMLU: Yapay Zeka ve Akademik Uygulamalar
	İçin En Kapsamlı ve Özgün Türkçe Veri Seti}},
	month = aug,
	year = 2024,
	publisher = {Zenodo},
	version = {v1.2},
	doi = {10.5281/zenodo.13378019},
	url = {https://doi.org/10.5281/zenodo.13378019}
	}


	#### Training Hyperparameters

	learning_rate=5e-5
	per_device_train_batch_size=8
	per_device_eval_batch_size=8
	num_train_epochs=3
	weight_decay=0.01


	#### Training Results


	![image/png](https://cdn-uploads.huggingface.co/production/uploads/669a700b990749decaab29af/xgl-5aCReHq8nA4RxgxhC.png)



	#### Metrics

	<!-- These are the evaluation metrics being used, ideally with a description of why. -->
	Training loss was monitored to evaluate how well the model is learning and to avoid overfitting. In this case, after 3 epochs, the model achieved a training loss of 0.0749, reflecting its ability to generalize well to the given data.