Update README.md
Browse files
README.md
CHANGED
@@ -14,6 +14,74 @@ model-index:
|
|
14 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
15 |
should probably proofread and complete it, then remove this comment. -->
|
16 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
17 |
# train_2024-05-16-11-01-44
|
18 |
|
19 |
This model is a fine-tuned version of [alpindale/Mistral-7B-v0.2-hf](https://huggingface.co/alpindale/Mistral-7B-v0.2-hf) on the dpo_zh_reject_en dataset.
|
|
|
14 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
15 |
should probably proofread and complete it, then remove this comment. -->
|
16 |
|
17 |
+
# Install
|
18 |
+
```bash
|
19 |
+
pip install peft transformers bitsandbytes
|
20 |
+
```
|
21 |
+
# Run by transformers
|
22 |
+
```python
|
23 |
+
from transformers import TextStreamer, AutoTokenizer, AutoModelForCausalLM
|
24 |
+
from peft import PeftModel
|
25 |
+
tokenizer = AutoTokenizer.from_pretrained("alpindale/Mistral-7B-v0.2-hf",)
|
26 |
+
mis_model = AutoModelForCausalLM.from_pretrained("alpindale/Mistral-7B-v0.2-hf", load_in_4bit = True)
|
27 |
+
mis_model = PeftModel.from_pretrained(mis_model, "svjack/DPO_ZH_RJ_EN_ORPO_Mistral7B_v2_lora_small")
|
28 |
+
mis_model = mis_model.eval()
|
29 |
+
|
30 |
+
streamer = TextStreamer(tokenizer)
|
31 |
+
|
32 |
+
def mistral_hf_predict(prompt, mis_model = mis_model,
|
33 |
+
tokenizer = tokenizer, streamer = streamer,
|
34 |
+
do_sample = True,
|
35 |
+
top_p = 0.95,
|
36 |
+
top_k = 40,
|
37 |
+
max_new_tokens = 512,
|
38 |
+
max_input_length = 3500,
|
39 |
+
temperature = 0.9,
|
40 |
+
repetition_penalty = 1.0,
|
41 |
+
device = "cuda"):
|
42 |
+
messages = [
|
43 |
+
{"role": "user", "content": prompt[:max_input_length]}
|
44 |
+
]
|
45 |
+
|
46 |
+
encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt")
|
47 |
+
model_inputs = encodeds.to(device)
|
48 |
+
|
49 |
+
generated_ids = mis_model.generate(model_inputs, max_new_tokens=max_new_tokens,
|
50 |
+
do_sample=do_sample,
|
51 |
+
streamer = streamer,
|
52 |
+
top_p = top_p,
|
53 |
+
top_k = top_k,
|
54 |
+
temperature = temperature,
|
55 |
+
repetition_penalty = repetition_penalty,
|
56 |
+
)
|
57 |
+
out = tokenizer.batch_decode(generated_ids)[0].split("[/INST]")[-1].replace("</s>", "").strip()
|
58 |
+
return out
|
59 |
+
|
60 |
+
out = mistral_hf_predict("如何对语料进行有效的翻译?",
|
61 |
+
repetition_penalty = 1.1,
|
62 |
+
temperature = 0.01,
|
63 |
+
max_new_tokens=1024
|
64 |
+
)
|
65 |
+
print(out)
|
66 |
+
```
|
67 |
+
|
68 |
+
# Output
|
69 |
+
```txt
|
70 |
+
当然,我很乐意帮助您了解如何有效地将语料翻译成英语。以下是一些关于这个主题的建议:
|
71 |
+
|
72 |
+
1. 选择适合您目标受众和需求的翻译服务。有许多可用的翻译服务,包括机器翻译、人工翻译和混合翻译。根据您的预算、时间限制和质量要求来决定最适合您的选项。
|
73 |
+
2. 准备好您的语料。确保您的文本清晰、简洁并且符合您的目标受众。这样做可以使翻译过程更加容易,并提高质量。
|
74 |
+
3. 选择一个专业的翻译服务。确保您选择一个具有良好声誉和经验的翻译服务。查看他们的客户评论、证书和凭证,以确保他们能够为您提供高质量的翻译。
|
75 |
+
4. 提供详细的指导方针。向翻译服务提供关于您的目标受众、风格和任何特殊要求的信息。这可以帮助翻译服务更好地理解您的需求,并提供更准确和相应的翻译。
|
76 |
+
5. 审查翻译。请随时审查翻译,并与翻译服务沟通以确保它满足您的需求。这可以帮助您确保翻译准确、清楚并符合您的目标受众。
|
77 |
+
6. 考虑修改。在审查翻译后,考虑修改以符合您的风格和需求。这可以帮助您创造一个独特的内容,同时保持其准确性。
|
78 |
+
7. 测试翻译。在发布之前,测试您的翻译,以确保它在不同设备上都显示得正确。这可以帮助您避免任何错误或问题。
|
79 |
+
8. 记录翻译。记录您的翻译,以便您可以在未来重新使用。这可以节省时间和金钱,并确保您的内容始终是准确的。
|
80 |
+
9. 跟踪翻译的进展。跟踪您的翻译的进展,以确保它按计划完成。这可以帮助您避免任何延迟或问题。
|
81 |
+
10. 评估翻译的质量。评估您的翻译的质量,以确保它符合您的需求。这可以帮助您确保您的内容准确、清楚并符合您的目标受众。
|
82 |
+
|
83 |
+
```
|
84 |
+
|
85 |
# train_2024-05-16-11-01-44
|
86 |
|
87 |
This model is a fine-tuned version of [alpindale/Mistral-7B-v0.2-hf](https://huggingface.co/alpindale/Mistral-7B-v0.2-hf) on the dpo_zh_reject_en dataset.
|