jed351
/

gpt2_tiny_zh-hk-shikoto

Text Generation

Generated from Trainer

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

gpt2_tiny_zh-hk-shikoto / README.md

jed351's picture

Update README.md

a2171f4 almost 2 years ago

|

history blame contribute delete

2.65 kB

	---
	tags:
	- generated_from_trainer
	datasets:
	- jed351/shikoto_zh_hk
	metrics:
	- accuracy
	model-index:
	- name: gpt2-shikoto
	results:
	- task:
	name: Causal Language Modeling
	type: text-generation
	dataset:
	name: jed351/shikoto_zh_hk
	type: jed351/shikoto_zh_hk
	metrics:
	- name: Accuracy
	type: accuracy
	value: 0.37381769930940056
	license: openrail
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# gpt2-shikoto

	This model was trained on a dataset I obtained from an online novel site.
	Please be aware that the stories (training data) might contain inappropriate content. This model is intended for research purposes only.



	The base model can be found [here](https://huggingface.co/jed351/gpt2-tiny-zh-hk), which was obtained by
	patching a [GPT2 Chinese model](https://huggingface.co/ckiplab/gpt2-tiny-chinese) and its tokenizer with Cantonese characters.
	Refer to the base model for info on the patching process.




	## Training procedure

	Please refer to the [script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/language-modeling)
	provided by Huggingface.


	The model was trained for 400,000 steps on 2 NVIDIA Quadro RTX6000 for around 15 hours at the Research Computing Services of Imperial College London.


	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-05
	- train_batch_size: 20
	- eval_batch_size: 20
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 2
	- total_train_batch_size: 40
	- total_eval_batch_size: 40
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- training_steps: 400000
	- mixed_precision_training: Native AMP

	### Training results


	### How to use it?
	```
	from transformers import AutoTokenizer
	from transformers import TextGenerationPipeline, AutoModelForCausalLM

	tokenizer = AutoTokenizer.from_pretrained("jed351/gpt2-tiny-zh-hk")
	model = AutoModelForCausalLM.from_pretrained("jed351/gpt2_tiny_zh-hk-shikoto")

	# try messing around with the parameters
	generator = TextGenerationPipeline(model, tokenizer,
	max_new_tokens=200,
	no_repeat_ngram_size=3) #, device=0) #if you have a GPU

	input_string = "your input"

	output = generator(input_string)
	string = output[0]['generated_text'].replace(' ', '')
	print(string)
	```

	### Framework versions

	- Transformers 4.26.0.dev0
	- Pytorch 1.13.1
	- Datasets 2.8.0
	- Tokenizers 0.13.2