Genji-JP 6B

Please check our blog post for more details, samples, evaluations and more: Blogpost

Model Description

Genji-JP 6B is a model finetuned on our Japanese storytelling dataset based on EleutherAI's GPT-J 6B model. This particular model is trained on Japanese web novels.

Hyperparameter Value
n_parameters 6,053,381,344
n_layers 28*
d_model 4,096
d_ff 16,384
n_heads 16
d_head 256
n_ctx 2,048
n_vocab 50,400 (same tokenizer as GPT-2/3)
position encoding Rotary position encodings (RoPE)
RoPE dimensions 64

* each layer consists of one feedforward block and one self attention block

The model consists of 28 layers with a model dimension of 4096, and a feedforward dimension of 16384. The model dimension is split into 16 heads, each with a dimension of 256. Rotary position encodings (RoPE) was applied to 64 dimensions of each head. The model is trained with a tokenization vocabulary of 50257, using the same set of BPEs as GPT-2/GPT-3.

Training data

GPT-J 6B was pretrained on the Pile, a large scale curated dataset created by EleutherAI for the purpose of training this model. After the pre-training, it's finetuned on our Japanese storytelling dataset. Check our blog post for more details.

How to use

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-j-6B")
model = AutoModelForCausalLM.from_pretrained("NovelAI/genji-jp", torch_dtype=torch.float16, low_cpu_mem_usage=True).eval().cuda()
text = '''ใ‚ใ‚‰ใ™ใ˜๏ผšใ‚ใชใŸใฏ็•ฐไธ–็•Œใซ่ปข็”Ÿใ—ใฆใ—ใพใ„ใพใ—ใŸใ€‚ๅ‹‡่€…ใจใชใฃใฆใ€ไปฒ้–“ใ‚’ไฝœใ‚Šใ€็•ฐไธ–็•Œใ‚’ๅ†’้™บใ—ใ‚ˆใ†๏ผ
***
่ปข็”Ÿใ™ใ‚‹ใจใ€ใ‚ใ‚‹่ƒฝๅŠ›ใ‚’ๆ‰‹ใซๅ…ฅใ‚Œใฆใ„ใŸใ€‚ใใ‚Œใฏใ€'''

tokens = tokenizer(text, return_tensors="pt").input_ids
generated_tokens = model.generate(tokens.long().cuda(), use_cache=True, do_sample=True, temperature=1, top_p=0.9, repetition_penalty=1.125, min_length=1, max_length=len(tokens[0]) + 400, pad_token_id=tokenizer.eos_token_id)
last_tokens = generated_tokens[0]
generated_text = tokenizer.decode(last_tokens).replace("๏ฟฝ", "")
print("Generation:\n" + generated_text)

When run, produces output like this:

Generation:
ใ‚ใ‚‰ใ™ใ˜๏ผšใ‚ใชใŸใฏ็•ฐไธ–็•Œใซ่ปข็”Ÿใ—ใฆใ—ใพใ„ใพใ—ใŸใ€‚ๅ‹‡่€…ใจใชใฃใฆใ€ไปฒ้–“ใ‚’ไฝœใ‚Šใ€็•ฐไธ–็•Œใ‚’ๅ†’้™บใ—ใ‚ˆใ†๏ผ
***
่ปข็”Ÿใ™ใ‚‹ใจใ€ใ‚ใ‚‹่ƒฝๅŠ›ใ‚’ๆ‰‹ใซๅ…ฅใ‚Œใฆใ„ใŸใ€‚ใใ‚Œใฏใ€ใ€Žไบˆ็Ÿฅใ€ใ ใ€‚้ŽๅŽปใ‹ใ‚‰ๆœชๆฅใฎใ“ใจใ‚’ใ€่ชฐใ‚‚็Ÿฅใ‚‰ใชใ„ๅ‡บๆฅไบ‹ใ‚‚ๅซใ‚ใฆ่ฆ‹้€šใ™ใ“ใจใŒๅ‡บๆฅใ‚‹ใ€‚
ๆ‚ช้ญ”ใฎๆฌ ็‰‡ใจๅ‘ผใฐใ‚Œใ‚‹ๅฐใ•ใช็ตๆ™ถใ‚’ๅ–ใ‚Š่พผใ‚“ใงใ€ไฝฟๅฝนใ™ใ‚‹ใ“ใจใŒๅ‡บๆฅใ‚‹ใ€‚ไบบใ‚’ๆƒนใใคใ‘ใ€ๅ •่ฝใ•ใ›ใ‚‹ใ€‚ไฝ•ใ‚ˆใ‚Šใ€ไฟบใฏ็”ทใชใ‚“ใฆๅฑ…ใชใ‹ใฃใŸใ—ใ€ๅฅณใซ่ˆˆๅ‘ณใ‚‚ใชใ„ใ€‚โ€ฆโ€ฆใใ‚“ใชใ‚ฏใ‚บใฎ็‰‡ๆฃ’ใ‚’ๆ‹…ใŽไธŠใ’ใ‚‹ๅฅดใŒๅคšใใชใ‚‹ใจๆ€ใ†ใจใ€ใกใ‚‡ใฃใจ่‹ฆใ—ใ„ใ€‚
ใ ใŒใ€ไธ€้ƒจใฎไบบ้–“ใซใฏๅ”ๅŠ›่€…ใ‚’ๅพ—ใ‚‹ใ“ใจใŒๅ‡บๆฅใ‚‹ใ€‚็›ฎ็ซ‹ใŸใชใ„่ก—ใซใ‚ใ‚‹ๅฏบใฎไธญใงใ€ๅธธใซๅฎถใซๅผ•ใใ“ใ‚‚ใฃใฆใ„ใ‚‹่€ไบบใ€‚ใใ‚“ใชใƒคใƒ„ใฎ้ญ‚ใ‚’ใ‚ณใƒณใƒˆใƒญใƒผใƒซใ™ใ‚‹ใ“ใจใŒๅ‡บๆฅใ‚‹ใฎใ ใ€‚ไพฟๅˆฉใช่ƒฝๅŠ›ใ ใ€‚ใ—ใ‹ใ—ใ€่ฃๅˆ‡ใ‚Š่€…ใฏๅคงๅ‹ขใ„ใ‚‹ใ€‚ๆฐ—ใ‚’ๆŠœใ‘ใฐใ€็‹‚ใ†ใ€‚ใ ใ‹ใ‚‰ๆณจๆ„ใŒๅฟ…่ฆใ ใ€‚
โ€•โ€•ใ€Œใ‚„ใฃใฆใ‚„ใ‚‹ใ‚ˆใ€
ใ€€ใ‚ขใƒผใƒญใƒณใฏไธๆ•ตใซ็ฌ‘ใฃใŸใ€‚ใ“ใฎ

Acknowledgements

This project was possible because of the compute provided by the TPU Research Cloud

Thanks EleutherAI for pretraining the GPT-J 6B model.

Thanks to everyone who contributed to this project!

Downloads last month
63
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Spaces using NovelAI/genji-jp 3