File size: 1,226 Bytes

---
license: apache-2.0
datasets:
- xmj2002/Chinese_modern_classical
language:
- zh
pipeline_tag: translation

---

使用的预训练模型为[fnlp/bart-base-chinese · Hugging Face](https://huggingface.co/fnlp/bart-base-chinese)

实现的功能为现代汉语到文言文（按照翻译任务那样训练）

## 超参数

- batch size: 32
- epoch: 5
- lr: 5e-5

由于使用的数据集样本数大，所以仅使用了10万条数据（整个数据集共有97万条数据）进行训练。

## Usage
```python
from transformers import AutoModelForSeq2SeqLM
from transformers import AutoTokenizer

prefix = "普通话到文言文"
tokenizer = AutoTokenizer.from_pretrained("xmj2002/bart_modern_classical")
model = AutoModelForSeq2SeqLM.from_pretrained("xmj2002/bart_modern_classical")

text = "曲曲折折的荷塘上面，弥望旳是田田的叶子。叶子出水很高，像亭亭旳舞女旳裙。"
inputs = tokenizer(prefix+text, return_tensors="pt").input_ids
outputs = model.generate(inputs, max_new_tokens=40, do_sample=True, top_k=30, top_p=0.95)
tokenizer.decode(outputs[0], skip_special_tokens=True)

# output：曲 塘 之 上 ， 弥 望 则 田 田 之 叶 ， 叶 出 水 高 ， 若 舞 女 低 裙 。
```