Qwen2.5 32B for Japanese to English Light Novel translation

This model was fine-tuned on light and web novel for Japanese to English translation.

It can translate entire chapters (up to 32K tokens total for input and output).

Usage

Load in llama.cpp

Prompt format

<|im_start|>system
Translate this text from Japanese to English.<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant

Example:

<|im_start|>system
Translate this text from Japanese to English.<|im_end|>
<|im_start|>user
<GLOSSARY>
γƒžγ‚€γƒ³ : Myne
</GLOSSARY>
γƒžγ‚€γƒ³γ€γƒ«γƒƒγƒ„γŒθΏŽγˆγ«ζ₯γŸγ‚ˆ<|im_end|>
<|im_start|>assistant
Myne, Lutz is here to take you home.

The glossary is optional. Remove it if not needed.

Text preprocessing

The Japanese text must be preprocessed with the following clean_string function that replaces some unicode characters with ASCII equivalents. Failure to do this may cause issues.

import ftfy

FTFY_ADDITIONAL_MAP = {
    "β€”": "--",
    "–": "-",
    "βΈ»": "----",
    "Β«": "\"",
    "Β»": "\"",
    "〝": "\"",
    "γ€Ÿ": "\"",
    "✧": "*",
    "✽": "*",
    "⬀": "*",
    "⭘": "*",
    "∴": "*",
    "∡": "*",
    "✩": "*",
    "【": "[",
    "】": "]",
    "γ€Œ": "[",
    "」": "]",
    "γ€–": "[",
    "γ€—": "]",
    "γ€ˆ": "<",
    "〉": ">",
    "γ€Š": "<<",
    "》": ">>",
}

def clean_string(text: str, strip: bool = True) -> str:
    config = ftfy.TextFixerConfig(normalization="NFC")
    s = ftfy.fix_text(text, config=config)
    s = "\n".join((x.strip() if strip else x.rstrip()) for x in s.splitlines())
    for b, g in FTFY_ADDITIONAL_MAP.items():
        s = s.replace(b, g)
    return s

Glossary

You can provide up to 30 custom translations for nouns and character names at runtime. Prefix your chapter with glossary terms (one per line) Japanese term : English term inside <GLOSSARY></GLOSSARY> tags.

For example, if you wish to have γƒžγ‚€γƒ³ translated as Myne you can construct the input prompt with:

glossary = [
    {"ja": "γƒžγ‚€γƒ³", "en": "Myne"},
]
chapter_text = "γƒžγ‚€γƒ³γ€γƒ«γƒƒγƒ„γŒθΏŽγˆγ«ζ₯γŸγ‚ˆ"

def make_glossary_str(glossary: list[dict[str, str]]) -> str:
    if glossart is None or len(glossary) == 0:
        return ""
    unique_glossary = {(term['ja'], term['en']) for term in glossary}
    terms = "\n".join([f"{ja} : {en}" for ja, en in unique_glossary])
    return f"<GLOSSARY>\n{terms}\n</GLOSSARY>\n"

user_prompt = f"{make_glossary_str(glossary)}{clean_string(chapter_text)}"
<GLOSSARY>
γƒžγ‚€γƒ³ : Myne
</GLOSSARY>
γƒžγ‚€γƒ³γ€γƒ«γƒƒγƒ„γŒθΏŽγˆγ«ζ₯γŸγ‚ˆ
Downloads last month
74
GGUF
Model size
32.8B params
Architecture
qwen2

4-bit

8-bit

Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.

Model tree for thefrigidliquidation/lightnovel-translate-Qwen2.5-32B-GGUF