Violet-yo
/

mt5-small-ft-Chinese-Braille

Text2Text Generation

Inference Endpoints

Model card Files Files and versions Community

Violet-yo commited on Jun 30, 2024

Commit

9c09662

·

verified ·

1 Parent(s): 445e6dd

Update README.md

Files changed (1) hide show

README.md +40 -0

README.md CHANGED Viewed

@@ -7,3 +7,43 @@ tags:
 - braille
 ---
 # MT5-Small-FT-Chinese-Braille

 - braille
 ---
 # MT5-Small-FT-Chinese-Braille
+<p align="center">
+  📃 <a href="https://arxiv.org/" target="_blank">[Paper]</a> • 💻 <a href="https://github.com/AlanYWu/ChineseBrailleTranslation" target="_blank">[Github]</a> • 🤗 <a href="https://huggingface.co/datasets/Violet-yo/Chinese-Braille-Dataset-10per-Tone" target="_blank">[Dataset]</a> • ⚙️ <a href="https://huggingface.co/Violet-yo/mt5-small-ft-Chinese-Braille" target="_blank">[Model]</a> • 🎬 <a href="https://vision-braille.com/" target="_blank">[Demo]</a>
+</p>
+This model is a fine-tuned version of the `mt5-small` model on the `Chinese-Braille-10per-Tone` dataset in https://huggingface.co/datasets/Violet-yo/Chinese-Braille-Dataset-10per-Tone. The training code can be found in the [Github repository](https://github.com/AlanYWu/ChineseBrailleTranslation).
+## Inference
+```python
+import evaluate
+from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
+braille_text = "⠼⠓⠙⠁⠃⠉⠊ ⠓⠶⠞⠼ ⠚⠴⠺ ⠤ ⠘ ⠌⠢ ⠛⠊ ⠝⠩ ⠳⠬ ⠊⠓⠑ ⠛⠕⠛⠫ ⠵⠪ ⠵⠼⠛⠫ ⠟⠥⠅⠷⠐ ⠊⠛⠡ ⠃⠔ ⠌⠲⠛⠕ ⠛⠩⠱⠖ ⠙⠢ ⠟⠥⠅⠷⠇⠭ ⠃⠥⠟⠲ ⠱⠦⠇⠪⠐ ⠙⠧⠱ ⠃⠡ ⠍⠮⠳ ⠙⠖ ⠛⠕⠱⠼ ⠙⠢ ⠟⠼⠙⠥ ⠐⠆"
+ground_truth = "841239\t黄腾 认为 ： “ 这 几 年 由于 一些 国家 在 增加 出口 ， 已经 把 中国 减少 的 出口量 补充 上来 ， 但是 并 没有 到 过剩 的 程度 。\n"
+model = AutoModelForSeq2SeqLM.from_pretrained("Violet-yo/mt5-small-ft-Chinese-Braille")
+tokenizer = AutoTokenizer.from_pretrained("Violet-yo/mt5-small-ft-Chinese-Braille", use_fast=False)
+inputs = tokenizer(
+    braille_text, return_tensors="pt", max_length=280, padding=True, truncation=True
+)
+output_sequences = model.generate(
+    input_ids=inputs["input_ids"],
+    attention_mask=inputs["attention_mask"],
+    max_new_tokens=300,
+    num_beams=5,
+)
+translated_text = tokenizer.decode(output_sequences[0], skip_special_tokens=True)
+print(f"{translated_text=}")
+print(f"{ground_truth=}")
+metric = evaluate.load("models/metrics/sacrebleu")
+results = metric.compute(predictions=[translated_text], references=[[ground_truth]])
+print(f"{results=}")
+```
+## Resources
+- Homepage: [Vision-Braille](https://vision-braille.com/)
+- Repository: [Github](https://github.com/AlanYWu/ChineseBrailleTranslation)
+- Paper: [arXiv](https://arxiv.org/)
+- HuggingFace: [Dataset](https://huggingface.co/datasets/Violet-yo/Chinese-Braille-Dataset-10per-Tone), [Model](https://huggingface.co/Violet-yo/mt5-small-ft-Chinese-Braille)
+  - [Full Tone Dataset](https://huggingface.co/datasets/Violet-yo/Chinese-Braille-Dataset-Full-Tone)
+  - [No Tone Dataset](https://huggingface.co/datasets/Violet-yo/Chinese-Braille-Dataset-No-Tone)