Update README.md
Browse files
README.md
CHANGED
@@ -7,3 +7,43 @@ tags:
|
|
7 |
- braille
|
8 |
---
|
9 |
# MT5-Small-FT-Chinese-Braille
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
7 |
- braille
|
8 |
---
|
9 |
# MT5-Small-FT-Chinese-Braille
|
10 |
+
<p align="center">
|
11 |
+
📃 <a href="https://arxiv.org/" target="_blank">[Paper]</a> • 💻 <a href="https://github.com/AlanYWu/ChineseBrailleTranslation" target="_blank">[Github]</a> • 🤗 <a href="https://huggingface.co/datasets/Violet-yo/Chinese-Braille-Dataset-10per-Tone" target="_blank">[Dataset]</a> • ⚙️ <a href="https://huggingface.co/Violet-yo/mt5-small-ft-Chinese-Braille" target="_blank">[Model]</a> • 🎬 <a href="https://vision-braille.com/" target="_blank">[Demo]</a>
|
12 |
+
</p>
|
13 |
+
|
14 |
+
This model is a fine-tuned version of the `mt5-small` model on the `Chinese-Braille-10per-Tone` dataset in https://huggingface.co/datasets/Violet-yo/Chinese-Braille-Dataset-10per-Tone. The training code can be found in the [Github repository](https://github.com/AlanYWu/ChineseBrailleTranslation).
|
15 |
+
|
16 |
+
## Inference
|
17 |
+
```python
|
18 |
+
import evaluate
|
19 |
+
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
|
20 |
+
|
21 |
+
braille_text = "⠼⠓⠙⠁⠃⠉⠊ ⠓⠶⠞⠼ ⠚⠴⠺ ⠤ ⠘ ⠌⠢ ⠛⠊ ⠝⠩ ⠳⠬ ⠊⠓⠑ ⠛⠕⠛⠫ ⠵⠪ ⠵⠼⠛⠫ ⠟⠥⠅⠷⠐ ⠊⠛⠡ ⠃⠔ ⠌⠲⠛⠕ ⠛⠩⠱⠖ ⠙⠢ ⠟⠥⠅⠷⠇⠭ ⠃⠥⠟⠲ ⠱⠦⠇⠪⠐ ⠙⠧⠱ ⠃⠡ ⠍⠮⠳ ⠙⠖ ⠛⠕⠱⠼ ⠙⠢ ⠟⠼⠙⠥ ⠐⠆"
|
22 |
+
ground_truth = "841239\t黄腾 认为 : “ 这 几 年 由于 一些 国家 在 增加 出口 , 已经 把 中国 减少 的 出口量 补充 上来 , 但是 并 没有 到 过剩 的 程度 。\n"
|
23 |
+
model = AutoModelForSeq2SeqLM.from_pretrained("Violet-yo/mt5-small-ft-Chinese-Braille")
|
24 |
+
tokenizer = AutoTokenizer.from_pretrained("Violet-yo/mt5-small-ft-Chinese-Braille", use_fast=False)
|
25 |
+
|
26 |
+
inputs = tokenizer(
|
27 |
+
braille_text, return_tensors="pt", max_length=280, padding=True, truncation=True
|
28 |
+
)
|
29 |
+
output_sequences = model.generate(
|
30 |
+
input_ids=inputs["input_ids"],
|
31 |
+
attention_mask=inputs["attention_mask"],
|
32 |
+
max_new_tokens=300,
|
33 |
+
num_beams=5,
|
34 |
+
)
|
35 |
+
translated_text = tokenizer.decode(output_sequences[0], skip_special_tokens=True)
|
36 |
+
print(f"{translated_text=}")
|
37 |
+
print(f"{ground_truth=}")
|
38 |
+
metric = evaluate.load("models/metrics/sacrebleu")
|
39 |
+
results = metric.compute(predictions=[translated_text], references=[[ground_truth]])
|
40 |
+
print(f"{results=}")
|
41 |
+
```
|
42 |
+
|
43 |
+
## Resources
|
44 |
+
- Homepage: [Vision-Braille](https://vision-braille.com/)
|
45 |
+
- Repository: [Github](https://github.com/AlanYWu/ChineseBrailleTranslation)
|
46 |
+
- Paper: [arXiv](https://arxiv.org/)
|
47 |
+
- HuggingFace: [Dataset](https://huggingface.co/datasets/Violet-yo/Chinese-Braille-Dataset-10per-Tone), [Model](https://huggingface.co/Violet-yo/mt5-small-ft-Chinese-Braille)
|
48 |
+
- [Full Tone Dataset](https://huggingface.co/datasets/Violet-yo/Chinese-Braille-Dataset-Full-Tone)
|
49 |
+
- [No Tone Dataset](https://huggingface.co/datasets/Violet-yo/Chinese-Braille-Dataset-No-Tone)
|