Violet-yo commited on
Commit
9c09662
·
verified ·
1 Parent(s): 445e6dd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +40 -0
README.md CHANGED
@@ -7,3 +7,43 @@ tags:
7
  - braille
8
  ---
9
  # MT5-Small-FT-Chinese-Braille
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  - braille
8
  ---
9
  # MT5-Small-FT-Chinese-Braille
10
+ <p align="center">
11
+ 📃 <a href="https://arxiv.org/" target="_blank">[Paper]</a> • 💻 <a href="https://github.com/AlanYWu/ChineseBrailleTranslation" target="_blank">[Github]</a> • 🤗 <a href="https://huggingface.co/datasets/Violet-yo/Chinese-Braille-Dataset-10per-Tone" target="_blank">[Dataset]</a> • ⚙️ <a href="https://huggingface.co/Violet-yo/mt5-small-ft-Chinese-Braille" target="_blank">[Model]</a> • 🎬 <a href="https://vision-braille.com/" target="_blank">[Demo]</a>
12
+ </p>
13
+
14
+ This model is a fine-tuned version of the `mt5-small` model on the `Chinese-Braille-10per-Tone` dataset in https://huggingface.co/datasets/Violet-yo/Chinese-Braille-Dataset-10per-Tone. The training code can be found in the [Github repository](https://github.com/AlanYWu/ChineseBrailleTranslation).
15
+
16
+ ## Inference
17
+ ```python
18
+ import evaluate
19
+ from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
20
+
21
+ braille_text = "⠼⠓⠙⠁⠃⠉⠊ ⠓⠶⠞⠼ ⠚⠴⠺ ⠤ ⠘ ⠌⠢ ⠛⠊ ⠝⠩ ⠳⠬ ⠊⠓⠑ ⠛⠕⠛⠫ ⠵⠪ ⠵⠼⠛⠫ ⠟⠥⠅⠷⠐ ⠊⠛⠡ ⠃⠔ ⠌⠲⠛⠕ ⠛⠩⠱⠖ ⠙⠢ ⠟⠥⠅⠷⠇⠭ ⠃⠥⠟⠲ ⠱⠦⠇⠪⠐ ⠙⠧⠱ ⠃⠡ ⠍⠮⠳ ⠙⠖ ⠛⠕⠱⠼ ⠙⠢ ⠟⠼⠙⠥ ⠐⠆"
22
+ ground_truth = "841239\t黄腾 认为 : “ 这 几 年 由于 一些 国家 在 增加 出口 , 已经 把 中国 减少 的 出口量 补充 上来 , 但是 并 没有 到 过剩 的 程度 。\n"
23
+ model = AutoModelForSeq2SeqLM.from_pretrained("Violet-yo/mt5-small-ft-Chinese-Braille")
24
+ tokenizer = AutoTokenizer.from_pretrained("Violet-yo/mt5-small-ft-Chinese-Braille", use_fast=False)
25
+
26
+ inputs = tokenizer(
27
+ braille_text, return_tensors="pt", max_length=280, padding=True, truncation=True
28
+ )
29
+ output_sequences = model.generate(
30
+ input_ids=inputs["input_ids"],
31
+ attention_mask=inputs["attention_mask"],
32
+ max_new_tokens=300,
33
+ num_beams=5,
34
+ )
35
+ translated_text = tokenizer.decode(output_sequences[0], skip_special_tokens=True)
36
+ print(f"{translated_text=}")
37
+ print(f"{ground_truth=}")
38
+ metric = evaluate.load("models/metrics/sacrebleu")
39
+ results = metric.compute(predictions=[translated_text], references=[[ground_truth]])
40
+ print(f"{results=}")
41
+ ```
42
+
43
+ ## Resources
44
+ - Homepage: [Vision-Braille](https://vision-braille.com/)
45
+ - Repository: [Github](https://github.com/AlanYWu/ChineseBrailleTranslation)
46
+ - Paper: [arXiv](https://arxiv.org/)
47
+ - HuggingFace: [Dataset](https://huggingface.co/datasets/Violet-yo/Chinese-Braille-Dataset-10per-Tone), [Model](https://huggingface.co/Violet-yo/mt5-small-ft-Chinese-Braille)
48
+ - [Full Tone Dataset](https://huggingface.co/datasets/Violet-yo/Chinese-Braille-Dataset-Full-Tone)
49
+ - [No Tone Dataset](https://huggingface.co/datasets/Violet-yo/Chinese-Braille-Dataset-No-Tone)